Please Read README.md before run the code!!! This repository uses Git Large File Storage (LFS) to track large data files. Otherwise, please use the link to download data and replace them with LFS format data.
This report investigates potential biases in stop-and-search practices by UK police officers focusing on the impact of the individual’s sex, ethnicity, age group, and the area of the police force.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ purrr::flatten() masks jsonlite::flatten()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
##
## Attaching package: 'rvest'
##
##
## The following object is masked from 'package:readr':
##
## guess_encoding
##
## The downloaded binary packages are in
## /var/folders/w_/2mj5l0j94159cp96r28wxjr00000gn/T//RtmpeUu3cW/downloaded_packages
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
##
## Breaking News: tmap 3.x is retiring. Please test v4, e.g. with
## remotes::install_github('r-tmap/tmap')
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
##
## The downloaded binary packages are in
## /var/folders/w_/2mj5l0j94159cp96r28wxjr00000gn/T//RtmpeUu3cW/downloaded_packages
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
Ethnicity Classification: Implementing a “combined_ethnicity” variable, blending self-defined and officer-observed ethnicities. This approach addresses the issue of missing data (20% in 2023) in self-reported ethnicity.
As it only includes self-defined ethnicity, hence only used for force analysis.
Defining “Reasonable Percentage”: Creating a metric to evaluate if searches correlate with legitimate police concerns or potential biases. This metric considers the proportion of outcome-linked searches.
Data discrepancies, such as the gender binary in 2021 Census, were noted for their potential impact on gender-based analysis.
Unavailability of data from certain police forces
## [1] "Greater Manchester Police: Due to a change in IT systems no crime, outcome or stop and search data is available from July 2019 onwards. The force are working to rectify this issue and provide the missing data over the coming months."
## [1] "Gwent Police: The force are currently rectifying the issues with the Stop and Search Data and are looking to provide the data once the issues are rectified."
## [1] "The Police Service of Northern Ireland does not currently provide stop and search data."
Analysis involved SQL-driven data manipulation, focusing on search frequency, search rate per 1000 population, outcome-linked searches proportion against demographic variables, presenting by year and in total.
Assumption: the research could be reasonable based on the police experience, local crime rate, hence we introduce reasonable search percentage to assess it, if a realtive low proportion of research outcome not link to the research object but a relative high quantity of research was conducted, a potential bias might exist
Pie chart shows gender distribution in police searches.
Bar charts compare search frequency, search per 1000 population and outcome-linked percentages by gender.
Line graphs compare the trend of search frequency by gender, search per 1000 population from 2021 to 2023.
A significant gender disparity exist in searches, with males constituting 89% of cases. Notably, while males have a higher search frequency, females show a higher rate of outcome-linked searches. This discrepancy raises questions about gender biases in search practices.
Pie chart shows ethnicity distribution in police searches.White has a dominant position in total quantity (62.65%) followed by Black (19.54%) and Asian (12.44%).
Bar charts reveals search frequency, search per 1000 population and outcome-linked percentages by ethnicity.
Line graphs indicates the trend of search frequency by ethnicity, search per 1000 population from 2021 to 2023.
The data reveals a pronounced ethnic bias, particularly against Black individuals, who face search rates over seven times higher than White individuals. The outcome-linked rate for Black individuals is significantly lower, suggesting searches are less frequently justified. Trends show some reduction in this bias, but the disparity remains stark.
Pie chart shows age distribution in police searches.
Bar charts compare search frequency, search per 1000 population and outcome-linked percentages by age.
Line graphs compare the trend of search frequency by age, search per 1000 population from 2021 to 2023.
The 18-24 age group is most frequently searched, with this group accounting for 31.73% of all searches, reducing from 2031 to 2023. The consistency in outcome-linked search proportions across age groups suggests a less pronounced age bias compared to gender or ethnicity. It is noticed that the age group over 34 contains a wider age range, hence result would be less intuitive.
Group bar charts show average search frequency and rate by age within each ethnic group from 2021 to 2023.
It highlights acute biases, particularly against Black individuals aged 18-24. The data shows this group faces significantly higher search rates, underscoring the compounded effect of age and ethnicity in policing biases.
Bar chart compares average search proportion from 2021 to 2023 with population proportion between males and females across age groups.
Males consistently experience higher search rates than females in all age groups. This disparity is particularly evident in the over 25 age group, despite a larger female population.
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/avon-and-somerset.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -3.839859 ymin: 50.82059 xmax: -2.244435 ymax: 51.68234
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/bedfordshire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -0.70217 ymin: 51.80504 xmax: -0.143934 ymax: 52.3229
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/cambridgeshire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -0.499971 ymin: 52.00566 xmax: 0.514519 ymax: 52.74003
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/cheshire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -3.128264 ymin: 52.94715 xmax: -1.974779 ymax: 53.48092
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/city-of-london.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -0.113819 ymin: 51.50657 xmax: -0.07275199 ymax: 51.52332
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/cleveland.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -1.452617 ymin: 54.46415 xmax: -0.788388 ymax: 54.72717
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/cumbria.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -3.639771 ymin: 54.04425 xmax: -2.159007 ymax: 55.18898
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/derbyshire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -2.034083 ymin: 52.69652 xmax: -1.166476 ymax: 53.54046
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/devon-and-cornwall.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -6.394892 ymin: 49.8821 xmax: -2.8838 ymax: 51.2465
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/dorset.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -2.961634 ymin: 50.51279 xmax: -1.681678 ymax: 51.08094
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/durham.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -2.35572 ymin: 54.4447 xmax: -1.242854 ymax: 54.91866
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/dyfed-powys.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -5.670391 ymin: 51.59593 xmax: -2.949653 ymax: 52.90154
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/essex.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -0.01976299 ymin: 51.44827 xmax: 1.296604 ymax: 52.09266
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/gloucestershire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -2.687567 ymin: 51.57749 xmax: -1.615209 ymax: 52.11254
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/greater-manchester.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -2.730452 ymin: 53.32729 xmax: -1.909627 ymax: 53.68571
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/gwent.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -3.334294 ymin: 51.49977 xmax: -2.649766 ymax: 51.98299
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/hampshire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -1.957275 ymin: 50.5532 xmax: -0.72932 ymax: 51.38393
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/hertfordshire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -0.745752 ymin: 51.59958 xmax: 0.195581 ymax: 52.08054
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/humberside.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -1.10306 ymin: 53.43294 xmax: 0.150174 ymax: 54.17726
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/kent.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: 0.03513401 ymin: 50.90988 xmax: 1.453548 ymax: 51.50349
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/lancashire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -3.084533 ymin: 53.48277 xmax: -2.045058 ymax: 54.23956
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/leicestershire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -1.597541 ymin: 52.39217 xmax: -0.428365 ymax: 52.97762
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/lincolnshire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -0.81965 ymin: 52.64111 xmax: 0.356051 ymax: 53.61725
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/merseyside.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -3.200624 ymin: 53.29777 xmax: -2.576665 ymax: 53.69416
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/metropolitan.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -0.510389 ymin: 51.28677 xmax: 0.334003 ymax: 51.69188
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/norfolk.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: 0.153544 ymin: 52.35535 xmax: 1.744527 ymax: 52.99273
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/north-wales.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -4.767852 ymin: 52.53462 xmax: -2.724131 ymax: 53.43011
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/north-yorkshire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -2.564724 ymin: 53.6211 xmax: -0.212338 ymax: 54.56214
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/northamptonshire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -1.332332 ymin: 51.97727 xmax: -0.341593 ymax: 52.6436
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/northern-ireland.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -8.624268 ymin: 53.89139 xmax: -5.119629 ymax: 55.44148
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/northumbria.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -2.689774 ymin: 54.78238 xmax: -1.345651 ymax: 55.81167
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/nottinghamshire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -1.344601 ymin: 52.78939 xmax: -0.666214 ymax: 53.50251
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/south-wales.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -4.332844 ymin: 51.37828 xmax: -3.067635 ymax: 51.83467
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/south-yorkshire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -1.821315 ymin: 53.30107 xmax: -0.865676 ymax: 53.66083
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/staffordshire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -2.470842 ymin: 52.42324 xmax: -1.585469 ymax: 53.22622
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/suffolk.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: 0.340007 ymin: 51.93207 xmax: 1.768976 ymax: 52.55018
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/surrey.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -0.848915 ymin: 51.0715 xmax: 0.05823101 ymax: 51.47156
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/sussex.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -0.958077 ymin: 50.72167 xmax: 0.867958 ymax: 51.1671
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/thames-valley.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -1.719514 ymin: 51.32891 xmax: -0.476606 ymax: 52.19628
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/warwickshire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -1.962019 ymin: 51.95535 xmax: -1.172136 ymax: 52.68722
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/west-mercia.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -3.235551 ymin: 51.82595 xmax: -1.757395 ymax: 52.9984
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/west-midlands.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -2.206875 ymin: 52.34763 xmax: -1.42396 ymax: 52.66277
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/west-yorkshire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -2.173273 ymin: 53.51973 xmax: -1.198798 ymax: 53.96316
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## Reading layer `Layer #0' from data source
## `/Users/shengyuwen/Downloads/MY472_final_assignment-main/force kmls/wiltshire.kml'
## using driver `KML'
## Simple feature collection with 1 feature and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -2.36559 ymin: 50.94498 xmax: -1.485719 ymax: 51.70314
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## [1] "Unmatched Names in force_df_sum:"
## character(0)
## [1] "Unmatched Names in all_kml_data:"
## [1] "greater-manchester" "gwent" "north-yorkshire"
## [4] "northern-ireland"
## [1] "Number of rows with 'MULTIPOLYGON Z EMPTY' geometry: 0"
## Warning: st_centroid assumes attributes are constant over geometries
## st_as_s2(): dropping Z and/or M coordinate
## Scale on map varies by more than 10%, scale bar may be inaccurate
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:httr':
##
## config
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
## Loading required package: viridisLite
##
## Attaching package: 'viridis'
## The following object is masked from 'package:scales':
##
## viridis_pal
## Scale on map varies by more than 10%, scale bar may be inaccurate
Maps show search rates and average search frequency from 2021 to 2023 by police force area.
The City of London exhibits the highest search rates, followed by Merseyside and the Metropolitan area. Notably, the Metropolitan area leads in the total number of searches.
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `datetime = ymd_hms(datetime)`.
## Caused by warning:
## ! 25890 failed to parse.
Bar charts depicting search frequency throughout the day and across different days of the week.
A concentration of searches between 4 PM and 1 AM, declining in the morning, indicates a focus on evening and nighttime. Searches are less frequent on Sundays, peaking on Fridays. This uneven distribution across times and days could skew results, potentially overemphasizing certain behaviors or demographic groups with bias.
# this chunk contains code that sets global options for the entire .Rmd.
# we use include=FALSE to suppress it from the top of the document, but it will still appear in the appendix.
knitr::opts_chunk$set(echo = FALSE) # actually set the global chunk options.
# we set echo=FALSE to suppress code such that it by default does not appear throughout the document.
# note: this is different from .Rmd default
library(httr)
library(jsonlite)
library(tidyverse)
library(jpeg) #to let us read .jpegs/.jpgs
library(grid) #to let us plot images
library(rvest)
library(dplyr)
library(DBI)
library(ggplot2)
options(repos = c(CRAN = "https://cran.rstudio.com/"))
install.packages("scales")
library(scales)
library(tmap)
library(classInt)
library(sf)
install.packages("ggspatial")
library(ggspatial)
library(gridExtra)
#use API to scarp data, get the date range: from 2020-12 until 2023-11,exactly 36 months, 3-year data
crimes_street_dates_url <- "https://data.police.uk/api/crimes-street-dates"
#fromJSON(crimes_street_dates_url) #run this function to test available data range ,which is 2020-12 until 2023-11
# use API to scrap all the police forces
forces_url <- "https://data.police.uk/api/forces"
forces<- fromJSON(forces_url)
# print(forces$id)
stop_and_searches_by_force_url <-"https://data.police.uk/api/stops-force?force=avon-and-somerset&date=2021-11"
# fromJSON(stop_and_searches_by_force_url)
# write a function to use api to scrapping data
get_force_data_monthly <- function(year, month, force) {
formatted_month <- formatC(month, width = 2, format = "d", flag = "0")
url <- paste0("https://data.police.uk/api/stops-force?force=", force, "&date=", year, "-", formatted_month)
response <- GET(url)
if (status_code(response) == 200) {
data <- fromJSON(rawToChar(response$content), flatten = TRUE)
# check whether it return data
if (is.data.frame(data) && nrow(data) > 0) {
# add police force id and year to the result dataframe
data$force <- force
data$year <- year
return(data)
}
}
# return empty data frame if no data or request failed
return(data.frame(year = integer(0), force = character(0), stringsAsFactors = FALSE))
}
# build an empty list to store data
all_data <- list()
# use loop to go through all month from 2021 to 2023( As we decide to use 2021 census data to eliminate scale influence)
years <- 2021:2023
for (force in forces$id) {
for (year in years) {
for (month in 1:12) {
message("Fetching data for ", force, " in ", year, "-", month)
monthly_data <- get_force_data_monthly(year, month, force)
all_data[[length(all_data) + 1]] <- monthly_data
}
}
}
# combine all data frame into one and fill NA for empty cell
all_data_df <- bind_rows(all_data). # this is original data
#print(head(all_data_df))
# store data at local
write.csv(all_data_df, "all_data_df.csv", row.names = FALSE)
#import research data
stop_and_searches_2021_2023<- read.csv("all_data_df.csv")
#head(stop_and_searches_2021_2023)
# Rearrange columns
stop_and_searches_2021_2023<- stop_and_searches_2021_2023 %>%
select(year, force, age_range, gender, self_defined_ethnicity, officer_defined_ethnicity,
outcome_linked_to_object_of_search, datetime, everything())
# head(stop_and_searches_2021_2023)
#edit data format to makeit coherenent among all data used
stop_and_searches_2021_2023<- stop_and_searches_2021_2023 %>%
mutate(
combined_ethnicity = ifelse(
!is.na(officer_defined_ethnicity),
officer_defined_ethnicity,
case_when(
str_detect(self_defined_ethnicity, "Black|African|Caribbean") ~ "Black",
str_detect(self_defined_ethnicity, "Asian|Asian") ~ "Asian or Asian British",
str_detect(self_defined_ethnicity, "Mixed|Multiple") ~ "Mixed",
# Add more patterns as necessary
TRUE ~ word(self_defined_ethnicity, 1) # Fallback to the first word
)
),
combined_ethnicity = case_when(
combined_ethnicity == "Asian" ~ "Asian or Asian British",
combined_ethnicity == "Black" ~ "Black or Black British",
combined_ethnicity == "Other" ~ "Other Ethnic Group",
TRUE ~ combined_ethnicity
)
) %>%
relocate(
combined_ethnicity,
.after = officer_defined_ethnicity
)
#install.packages(c("DBI", "RSQLite"))
library(DBI)
# Create database: This will create a file in our hard drive if it does not exist already.
db <- dbConnect(RSQLite::SQLite(), "database/assignment4.sqlite")
db_exists <- file.exists("database/assignment4.sqlite")
# db_exists
# write the called API data into databse after manipulation
dbWriteTable(db, "stop_and_searches_2021_2023", stop_and_searches_2021_2023, row.names = FALSE, overwrite = TRUE)
stop_search_mar21_mar23 <- read.csv("stop-search-open-data-tables-mar21-mar23.csv")
dbWriteTable(db, "stop_search_mar21_mar23", stop_search_mar21_mar23, row.names = FALSE, overwrite = TRUE)
tables <- dbListTables(db)
print(tables)
dbListFields(db,"stop_search_mar21_mar23")
#check how many forces stop and research data are included
unique_values_in_force <- unique(stop_and_searches_2021_2023$force)
#unique_values_in_force
library(dplyr)
# import 2021 census data, population by force and ethnicity
population_by_force_by_ethnicity_21census<- read.csv("population-by-force-by-ethnicity.csv")
population_by_force_by_ethnicity_21census <- population_by_force_by_ethnicity_21census %>%
mutate(
combinedethnicity = case_when(
grepl("Mixed", Self.defined.ethnicity) ~ "Mixed", # Assign "Mixed" when Self.defined.ethnicity is "Mixed"
TRUE ~ Self.defined.ethnicity
),
Police.Force.Area = case_when(
Police.Force.Area == "Avon and Somerset" ~ "avon-and-somerset",
Police.Force.Area == "Devon and Cornwall" ~ "devon-and-cornwall",
Police.Force.Area == "London, City of" ~ "city-of-london",
Police.Force.Area == "Dyfed-Powys" ~ "dyfed-powys",
Police.Force.Area == "North Wales" ~ "north-wales",
Police.Force.Area == "South Wales" ~ "south-wales",
Police.Force.Area == "Thames Valley" ~ "thames-valley",
Police.Force.Area == "South Yorkshire" ~ "south-yorkshire",
Police.Force.Area == "North Yorkshire" ~ "north-yorkshire",
Police.Force.Area == "West Mercia" ~ "west-mercia",
Police.Force.Area == "West Midlands" ~ "west-midlands",
Police.Force.Area == "West Yorkshire" ~ "west-yorkshire",
Police.Force.Area == "Greater Manchester" ~ "greater-manchester",
Police.Force.Area == "Metropolitan Police" ~ "Metropolitan",
TRUE ~ Police.Force.Area
)
)
population_by_force_by_ethnicity_21census <- population_by_force_by_ethnicity_21census %>%
filter(Census.Year != 2011)
population_by_force_by_ethnicity_21census$Police.Force.Area <- tolower(population_by_force_by_ethnicity_21census$Police.Force.Area)
population_by_force_by_ethnicity_21census$Population <- as.numeric(gsub(",", "", population_by_force_by_ethnicity_21census$Population))
#head(population_by_force_by_ethnicity_21census)
dbWriteTable(db, "population_by_force_by_ethnicity_21census", population_by_force_by_ethnicity_21census, row.names = FALSE, overwrite = TRUE)
tables <- dbListTables(db)
# print(tables)
# dbListFields(db,"population_by_force_by_ethnicity_21census")
library(rvest)
library(xml2)
# URL of the webpage to scrape
url <- "https://www.ethnicity-facts-figures.service.gov.uk/uk-population-by-ethnicity/demographics/age-groups/latest/"
# Read the HTML content from the website
webpage <- read_html(url)
# Extract the table
table <- html_node(webpage, "#main-content > div:nth-child(12) > div > div:nth-child(1) > div > div > div > table")
# Convert the table into a dataframe
df <- html_table(table, fill = TRUE)
# Setting the second row as header
colnames(df) <- as.character(unlist(df[2, ]))
# Removing the first two rows
df <- df[-c(1, 2), ]
# View the modified dataframe
# head(df)
df <- rename(df, Age = `Age brackets`)
# Gathering all columns except the first one into key-value pairs
age_ethnicity <- gather(df, key = "key", value = "value", -1)
# Creating separate columns for 'ethnicity' and 'type' (percentage or count)
age_ethnicity<- age_ethnicity %>%
separate(key, into = c("ethnicity", "type"), sep = " ") %>%
spread(type, value)
# Renaming columns for clarity
colnames(age_ethnicity) <- c("age_group", "ethnicity", "Percentage", "Population")
age_ethnicity$Population <- as.numeric(gsub(",", "", age_ethnicity$Population))
# View the transformed dataframe
# head(age_ethnicity)
# Function to map age groups
map_age_group <- function(age_group) {
if (age_group %in% c("Age 0 to 4", "Age 5 to 9")) {
return("under 10")
} else if (age_group %in% c("Age 10 to 14", "Age 15 to 17")) {
return("10-17")
} else if (age_group == "Age 18 to 24") {
return("18-24")
} else if (age_group %in% c("Age 25 to 29", "Age 30 to 34")) {
return("25-34")
} else {
return("over 34")
}
}
# Apply the function to the dataframe
age_ethnicity$age_group <- sapply(age_ethnicity$age_group, map_age_group)
age_ethnicity<- age_ethnicity %>%
mutate(
ethnicity = case_when(
ethnicity == "Asian" ~ "Asian or Asian British",
ethnicity == "Black" ~ "Black or Black British",
ethnicity == "Other" ~ "Other Ethnic Group",
TRUE ~ ethnicity # Keeps other values unchanged
)
)
age_ethnicity <- age_ethnicity %>%
filter(ethnicity != "All")
# View the modified dataframe
# head(age_ethnicity)
dbWriteTable(db, "age_ethnicity", age_ethnicity , row.names = FALSE, overwrite = TRUE)
library(dplyr)
library(RSelenium)
library(rvest)
library("netstat")
library(stringr)
driver <- rsDriver(browser = "firefox", port = free_port(random = TRUE), chromever = NULL)
remDr <- driver[["client"]]
url<-"https://data.police.uk/changelog/"
url2<-"https://data.police.uk/data/"
remDr$navigate(url)
html <- read_html(url)
greater_manchester<- html%>% html_elements(css=" #content > div:nth-child(2) > ul:nth-child(3) > li:nth-child(5)" )
greater_manchester_text <- greater_manchester%>% html_text()
greater_manchester_text<- gsub("^\\s+|\\s+$", "", greater_manchester_text)
gwent<- html%>% html_elements(css=" #content > div:nth-child(2) > ul:nth-child(3) > li:nth-child(6)" )
gwent_text <-gwent%>% html_text()
gwent_text<- gsub("^\\s+|\\s+$", "", gwent_text)
html <- read_html(url2)
north_ireland<- html%>% html_elements(css=" #downloads > p:nth-child(2)" )
north_ireland_text <- north_ireland%>% html_text()
writeLines(greater_manchester_text, "greater_manchester_text.txt")
writeLines(gwent_text, "gwent_text.txt")
writeLines(north_ireland_text, "north_ireland_text.txt")
greater_manchester_text <- readLines("greater_manchester_text.txt")
gwent_text <- readLines("gwent_text.txt")
north_ireland_text <- readLines("north_ireland_text.txt")
greater_manchester_text
gwent_text
north_ireland_text
ethnicity_age_sex_census<- read.csv("age-sex-ethnicity.csv")
# Selecting specific columns and renaming them
ethnicity_age_sex_census <- ethnicity_age_sex_census %>%
select(
age = `Age..86.categories..Code`,
ethnicity_group = `Ethnic.group..6.categories.`,
sex = `Sex..2.categories.`,
observation = Observation
)
# View the modified dataframe
# shead(ethnicity_age_sex_census)
age_groups <- c("under 10", "10-17", "18-24", "25-34", "over 34")
ethnicity_age_sex_census <- ethnicity_age_sex_census %>%
mutate(age_group = cut(age,
breaks = c(-Inf, 10, 18, 25, 34, Inf),
labels = age_groups,
right = FALSE))
ethnicity_age_sex_census <- ethnicity_age_sex_census %>%
mutate(
ethnicity_group = case_when(
ethnicity_group == "Asian, Asian British or Asian Welsh" ~ "Asian or Asian British",
ethnicity_group == "Black, Black British, Black Welsh, Caribbean or African" ~ "Black or Black British",
ethnicity_group == "Mixed or Multiple ethnic groups" ~ "Mixed",
ethnicity_group == "Other ethnic group" ~ "Other Ethnic Group",
TRUE ~ ethnicity_group # Keeps other values unchanged
)
)
ethnicity_age_sex_census <- select(ethnicity_age_sex_census, -age)
# head(ethnicity_age_sex_census)
dbWriteTable(db, "ethnicity_age_sex_census", ethnicity_age_sex_census , row.names = FALSE, overwrite = TRUE)
#This is a brief summary about the 2021 Census population data against research variable gender, ethnicity, age and local force population, which would be use to eliminate the scale influene introdued by total number of searches.
age_population_df <- dbGetQuery(db,"
SELECT
age_group,
SUM(REPLACE(Population, ',', '') + 0) AS TotalPopulation
FROM age_ethnicity
GROUP BY age_group")
ethnicity_population_df <- dbGetQuery(db, "
SELECT
combinedethnicity,
SUM(REPLACE(Population, ',', '') + 0) AS TotalPopulation
FROM population_by_force_by_ethnicity_21census
GROUP BY combinedethnicity")
force_population_df <- dbGetQuery(db, "
SELECT
`Police.Force.Area`,
SUM(REPLACE(Population, ',', '') + 0) AS TotalPopulation
FROM population_by_force_by_ethnicity_21census
GROUP BY `Police.Force.Area`")
sex_population_df <- dbGetQuery(db, "
SELECT
sex,
SUM(REPLACE(observation, ',', '') + 0) AS TotalPopulation
FROM ethnicity_age_sex_census
GROUP BY sex")
age_population_df
ethnicity_population_df
force_population_df
sex_population_df
#indentify the potential value in gender column from orginal databse
unique_values_in_gender <- unique(stop_and_searches_2021_2023$gender)
# unique_values_in_gender
sex_sum <- dbGetQuery(db, "
SELECT
gender,
COUNT(*) AS number_of_searches,
COUNT(*)/3 AS average_number_of_searches_per_year,
(COUNT(*) * 1000.0) /
(3* (SELECT SUM(REPLACE(psex.observation, ',', '') + 0) FROM ethnicity_age_sex_census psex
WHERE psex.sex = ss.gender) )AS searches_per_1000_population_per_year,
(COUNT(*) * 100.0) /
(SELECT COUNT(*) FROM stop_and_searches_2021_2023
WHERE gender IS NOT NULL) AS search_percentage,
(SUM(CASE WHEN outcome_linked_to_object_of_search = TRUE THEN 1 ELSE 0 END) * 100.0) /
NULLIF(COUNT(*), 0) AS reasonable_search_percentage
FROM stop_and_searches_2021_2023 ss
WHERE gender IS NOT NULL
GROUP BY gender
")
# sex_sum
sex_df_year <- dbGetQuery(db, "
SELECT
year,
gender,
COUNT(*) AS number_of_searches,
(COUNT(*) * 1000.0) /
((SELECT SUM(REPLACE(psex.observation, ',', '') + 0) FROM ethnicity_age_sex_census psex
WHERE psex.sex = ss.gender) )AS searches_per_1000_population_per_year,
(COUNT(*) * 100.0) /
(SELECT COUNT(*)
FROM stop_and_searches_2021_2023
WHERE year = ss.year AND gender IS NOT NULL) AS search_percentage_per_year,
(SUM(CASE WHEN outcome_linked_to_object_of_search = TRUE THEN 1 ELSE 0 END) * 100.0) / NULLIF(COUNT(*), 0) AS reasonable_search_percentage
FROM stop_and_searches_2021_2023 ss
WHERE gender IS NOT NULL
GROUP BY year, gender
")
# sex_df_year
library(ggplot2)
options(repos = c(CRAN = "https://cran.rstudio.com/"))
install.packages("scales")
library(scales)
# Create the pie chart
pie_chart <- ggplot(sex_sum, aes(x = "", y = search_percentage, fill = gender)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
theme_void() +
scale_fill_brewer(palette = "Set1", name = "Gender", labels = paste(sex_sum$gender, ": ", round(sex_sum$search_percentage, 2), "%")) +
theme(legend.position = "bottom",legend.title = element_blank(),
plot.caption = element_text(hjust = 0.5)) +
labs(subtitle = "Percentage of Stop and Searches by Sex",
caption = "Note:\n• Does not include vehicle only searches.\n• Does not include other sexes. \n• Does not include searches record where sex is unkown",
theme(
#plot.title = element_text(size = 14),
legend.margin = margin(1, 1, 1, 1, unit = "pt"))
# subtitle = "from Financial Year 2020 to 2022"
)
# Add text labels for all categories except "Other"
pie_chart <- pie_chart +
geom_text(data = subset(sex_sum, gender != "Other"), # Exclude "Other" here
aes(label = paste0(round(search_percentage, 2), "%")),
position = position_stack(vjust = 0.5),
color = "black", size = 3)
# Then, overlay the text label for "Other" closer to the pie
pie_chart <- pie_chart +
geom_text(data = subset(sex_sum, gender == "Other"),
aes(label = paste0(round(search_percentage, 2), "%"),
x = 1.2, y = 0), # Adjust position of "Other" label
color = "black", size = 3)+
theme(plot.caption = element_text(hjust = 0), # Left-align the text within the caption
plot.caption.position = "plot",
plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm"))
# Display the pie chart
# pie_chart
# Filter out the "Other" category for displaying
filtered_sex_sum <- sex_sum %>%
filter(gender != "Other") %>%
droplevels() # Important: This drops unused factor levels
# Define colors for each ethnicity
base_colors <- c("Male" = "#1f77b4",
"Female" = "#d62728")
# Create a named vector for alpha values to be used for the legend
alpha_values <- c("Linked to search object" = 1, "No Link" = 0.45)
# Create the bar chart
bar_chart <- ggplot(filtered_sex_sum, aes(x = gender, y = number_of_searches, fill = gender)) +
geom_bar(stat = "identity", aes(alpha = "No Link")) +
geom_bar(stat = "identity",
aes(y = number_of_searches * reasonable_search_percentage / 100, alpha = "Linked to search object")) +
geom_text(aes(y = number_of_searches * reasonable_search_percentage / 100,
label = paste0(round(reasonable_search_percentage, 2), "%")),
vjust = 1.3, size=2.8,
color = "black") +
geom_text(aes(label = number_of_searches),
position = position_stack(vjust = 1.1),
color = "black") +
scale_y_continuous(labels = label_number())+
scale_fill_manual(values = base_colors) +
scale_alpha_manual(values = alpha_values,
guide = guide_legend(title = "Outcome")) +
theme_minimal() +
labs(
#theme(plot.title = element_text(size = 8)),
#caption = "Note: \n• Does not include vehicle only searches.\n• Does not include other sexes. \n• Does not include searches record where sex is unkown",
subtitle = "Total Number of Stop and Searches by Sex ",
#subtitle = "in financial year with outcome-linked searches percentage",
x = "Gender", y = "Total Number of Searches", fill = "Gender") +
theme(axis.text.x = element_text( size = 8),
plot.subtitle = element_text(hjust = 0.9,vjust=2))+
theme(legend.text = element_text(size = 8), # Adjust text size
legend.key.size = unit(0.5, "cm"))+
guides(fill = guide_legend(nrow = 1, title = "Gender"),
alpha = guide_legend(nrow = 1, title = "Outcome")) +
theme(legend.position = "bottom",
legend.box = "vertical",
)
#bar_chart
grid.arrange(pie_chart, bar_chart, ncol = 2, widths = c(0.95, 1), top = textGrob("Search Against Gender from 2021 to 2023 with Outcome-linked Percentage ",
gp = gpar(fontface = "bold", fontsize = 14)))
library(ggplot2)
library(scales)
# Filter out the "Other" category
filtered_sex_df_year <- sex_df_year %>%
filter(gender != "Other") %>%
droplevels() # Important: This drops unused factor levels
# Convert 'year' to a factor to ensure proper dodging
filtered_sex_df_year$year <- factor(filtered_sex_df_year$year)
# Plotting the bar chart
ggplot(filtered_sex_df_year, aes(x = gender, y = searches_per_1000_population_per_year, fill = year)) +
geom_bar(stat = "identity", position = position_dodge()) +
geom_text(
aes(label = paste0(round(searches_per_1000_population_per_year, 2)), group = year),
vjust = -0.3,
position = position_dodge(width = 0.9),
color = "black",
size = 3
) +
scale_y_continuous(labels = scales::comma) +
labs(
caption = "Note: \n• Does not include vehicle only searches.\n• Does not include other sexes. \n• Does not include searches record where sex is unkown",
y = "Number of Searches per 1000 Population",x="Gender",
title = "Search Rate: Number of Searches per 1000 Population by Ethnicity",
subtitle = "from Financial Year 2020 to 2022 using 2021 Census Data"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),# Center-align the title
plot.subtitle = element_text(hjust = 0.5),# Center-align the subtitle
plot.caption = element_text(hjust = 0), # Left-align the text within the caption
plot.caption.position = "plot",
plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm")
)
library(ggplot2)
library(scales) # For formatting the y-axis labels
# Calculate the non-reasonable searches
filtered_sex_df_year$non_reasonable_searches <- filtered_sex_df_year$number_of_searches - (filtered_sex_df_year$number_of_searches * filtered_sex_df_year$reasonable_search_percentage / 100)
# Define alpha values for reasonable and non-reasonable searches
alpha_values <- c("Link to search object" = 1, "No link" = 0.5) # Adjust non-reasonable alpha here
# Plotting the bar chart
ggplot(filtered_sex_df_year, aes(x = gender, fill = year)) +
# Reasonable searches
geom_bar(aes(y = number_of_searches * reasonable_search_percentage / 100, alpha = "Link to search object"), stat = "identity", position = "dodge") +
# Non-reasonable searches (adjusted alpha value for better visibility)
geom_bar(aes(y = non_reasonable_searches, alpha = "No link"), stat = "identity", position = "dodge") +
# Labels for total number of searches
geom_text(
aes(y = number_of_searches, label = number_of_searches),
vjust = -0.3,
position = position_dodge(width = 0.9),
color = "black",
size = 3
) +
# Labels for percentage of reasonable searches
geom_text(
aes(y = number_of_searches * reasonable_search_percentage / 100, label = paste0(round(reasonable_search_percentage, 2), "%")),
vjust = -0.5,
position = position_dodge(width = 0.9),
color = "black",
size = 2.5
) +
scale_y_continuous(labels = scales::comma) +
scale_alpha_manual(values = alpha_values, guide = guide_legend(title = "Search Outcome")) +
labs(caption = "Note: \n• Does not include vehicle only searches.\n• Does not include other sexes. \n• Does not include searches record where sex is unkown",x = "Gender Group", y = "Number of Searches", title = "Number of Stop and Searches by Gender from Financial Year 2020 to 2022") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
plot.caption = element_text(hjust = 0),
plot.caption.position = "plot",
plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm"))
library(ggplot2)
library(scales)
library(tidyr)
library(dplyr)
sex_df_year$year <- as.numeric(as.character(sex_df_year$year))
# Reshape the data to long format
long_sex_df_year <- sex_df_year %>%
pivot_longer(
cols = c(searches_per_1000_population_per_year, number_of_searches),
names_to = "metric",
values_to = "value"
)
long_sex_df_year <- long_sex_df_year %>%
filter(!is.na(year) & !is.na(value) & !is.na(gender))
# Plotting the line graph
line_chart <- ggplot(long_sex_df_year, aes(x = year, y = value, color = gender, group = gender)) +
geom_line() +
geom_point() + # Add points to the line
geom_text(
aes(label = paste0(round(value, 2))),
vjust = -0.5, # Adjust this value to position the text above the points
color = "black",
size = 2.1,
check_overlap = TRUE # Avoid overlapping text
) +
scale_x_continuous(breaks = unique(sex_df_year$year)) + # Set breaks at unique years
scale_y_continuous(labels = scales::comma) +
facet_wrap(~ metric, scales = "free_y") +
labs(
caption = "Note: \n• Does not include vehicle only searches. \n• Does not include searches record whose ethnicity is unknown.",
x = "Year", y = "Value",
title = "Search Rate and Total Number of Searches per 1000 Population by Gender",
subtitle = "from Financial Year 2020 to 2022 using 2021 Census Data"
) +
theme_minimal() +
theme(
legend.position = "bottom",
plot.subtitle = element_text(hjust = 0.5),# Center-align the subtitle
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0),
plot.caption = element_text(hjust = 0),
plot.caption.position = "plot",
plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm")
)
# Display the line graph
print(line_chart)
unique_values_in_ethnicity <- unique(stop_and_searches_2021_2023$combined_ethnicity)
# unique_values_in_ethnicity
ethnicity_df_year <- dbGetQuery(db, "
SELECT
year,
combined_ethnicity as ethnicity,
COUNT(*) AS number_of_searches,
(COUNT(*) * 1000.0) /
((SELECT SUM(REPLACE(p.Population, ',', '') + 0)
FROM population_by_force_by_ethnicity_21census p
WHERE p.combinedethnicity = ss.combined_ethnicity) )AS searches_per_1000_population_per_year,
(COUNT(*) * 100.0) / (SELECT COUNT(*)
FROM stop_and_searches_2021_2023
WHERE year = ss.year AND combined_ethnicity IS NOT NULL) AS search_percentage,
SUM(CASE WHEN outcome_linked_to_object_of_search = TRUE THEN 1 ELSE 0 END) *100/ NULLIF(COUNT(*), 0) AS percentage_reasonable
FROM stop_and_searches_2021_2023 ss
WHERE combined_ethnicity IS NOT NULL
GROUP BY year,combined_ethnicity
")
# ethnicity_df_year
ethnicity_df <- dbGetQuery(db, "
SELECT
combined_ethnicity as ethnicity,
COUNT(*) AS number_of_searches,
COUNT(*)/3 AS average_number_of_searches_per_year,
(COUNT(*) * 1000.0) /
(3*(SELECT SUM(REPLACE(p.Population, ',', '') + 0)FROM population_by_force_by_ethnicity_21census p
WHERE p.combinedethnicity = ss.combined_ethnicity) )AS searches_per_1000_population_per_year,
(COUNT(*) * 100.0) / (SELECT COUNT(*)
FROM stop_and_searches_2021_2023
WHERE combined_ethnicity IS NOT NULL) AS search_percentage,
SUM(CASE WHEN outcome_linked_to_object_of_search = TRUE THEN 1 ELSE 0 END) *100/
NULLIF(COUNT(*), 0) AS percentage_reasonable
FROM stop_and_searches_2021_2023 ss
WHERE combined_ethnicity IS NOT NULL
GROUP BY combined_ethnicity
")
# ethnicity_df
library(ggplot2)
library(scales)
# Assuming ethnicity_df is your dataframe already loaded in R
# Pie Chart for search_percentage with labels
pie_chart <- ggplot(ethnicity_df, aes(x = "", y = search_percentage, fill = ethnicity)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
geom_text(aes(label = paste0(round(search_percentage, 1), "%")),
position = position_stack(vjust = 0.5),
color = "black",size=2.4) +
theme_void() +
scale_fill_brewer(palette = "Set1", name = "Ethnicity", labels = paste(ethnicity_df$ethnicity, ": ", round(ethnicity_df$search_percentage, 2),
"%"))+
labs(caption = "Note: \n• Does not include vehicle only searches. \n• Does not include searches record whose ethnicity is unkown.",
fill = "Ethnicity",
title = "Percentage of Searches by Ethnicity from Year 2021 to 2023")+
theme(
plot.caption = element_text(hjust = 0), # Left-align the text within the caption
plot.caption.position = "plot" # Center the caption box itself at the bottom of the plot
) +
theme(plot.margin = margin(1, 1, 1, 1, "cm"))# Center-align the subtitle
pie_chart
# Define colors for each ethnicity
base_colors <- c("Asian or Asian British" = "#d62728",
"Black or Black British" = "#1f77b4",
"Other Ethnic Group" = "purple",
"White" = "#ff7f0e",
"Mixed" = "#2ca02c")
# Create a named vector for alpha values to be used for the legend
alpha_values <- c("Linked to search object" = 1, "No Link" = 0.45)
# Create the bar chart
bar_chart <- ggplot(ethnicity_df, aes(x = ethnicity, y = number_of_searches, fill = ethnicity)) +
geom_bar(stat = "identity", aes(alpha = "No Link")) +
geom_bar(stat = "identity",
aes(y = number_of_searches * percentage_reasonable / 100, alpha = "Linked to search object")) +
# Labels for percentage of reasonable searches
geom_text(aes(y = number_of_searches * percentage_reasonable / 100,
label = paste0(round(percentage_reasonable, 2), "%")),
vjust = 0.8,
color = "black",size=2) +
# Labels for total number of searches)
geom_text(aes(label = number_of_searches),
vjust = -0.05,
color = "black",size=3) +
scale_y_continuous(labels = comma) +
scale_fill_manual(values = base_colors) +
scale_alpha_manual(values = alpha_values,
guide = guide_legend(title = "Search Outcome")) +
theme_minimal() +
labs(caption = "Note: \n• Does not include vehicle only searches. \n• Does not include searches record whose ethnicity is unkown.",
title = "Total Number of Stop and Searches by Ethnicity from 2021 to 2023",
subtitle = "with outcome-linked searches percentage",
x = "Ethnicity", y = "Number of Searches", fill = "Ethnicity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 8),
plot.subtitle = element_text(hjust = 0.5))+ # Center-align the subtitle
theme(plot.subtitle = element_text(hjust = 0.5),# Center-align the subtitle
plot.caption = element_text(hjust = 0), # Left-align the text within the caption
plot.caption.position = "plot" # Center the caption box itself at the bottom of the plot
) +
theme(plot.margin = margin(0.5,0.5, 0.5, 0.5, "cm"))# Center-align the subtitle
bar_chart
library(ggplot2)
library(scales) # For formatting the y-axis labels
# Convert 'year' to a factor to ensure proper dodging
ethnicity_df_year$year <- factor(ethnicity_df_year$year)
# Plotting the bar chart
ggplot(ethnicity_df_year, aes(x = ethnicity, y = searches_per_1000_population_per_year, fill = year)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(
aes(label = paste0(round(searches_per_1000_population_per_year, 2)), group = year),
#aes(label = as.integer(searches_per_1000_population_per_year)), # Format as integer
vjust = -0.3,
position = position_dodge(width = 0.9),
color = "black",
size = 3 # Adjust the text size here
) +
scale_y_continuous(labels = scales::comma) + # Use comma formatting for y-axis
labs(caption = "Note: \n• Does not include vehicle only searches. \n• Does not include searches record whose ethnicity is unkown.",x = "Ethnic Group", y = "Number of Searches per 1000 Population", title = "Search Rate: Number of Searches per 1000 Population by Ethnicity",
subtitle = "from Financial Year 2021 to 2023 using 2021 Census Data") +
theme_minimal() +
theme(plot.caption = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5), # Center-align the title
)+
theme(
plot.caption = element_text(hjust = 0), # Left-align the text within the caption
plot.caption.position = "plot" # Center the caption box itself at the bottom of the plot
) +
theme(plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm"))# Center-align the subtitle
library(ggplot2)
library(scales) # For formatting the y-axis labels
# Convert 'year' to a factor to ensure proper dodging
ethnicity_df_year$year <- factor(ethnicity_df_year$year)
# Calculate the non-reasonable searches
ethnicity_df_year$non_reasonable_searches <- ethnicity_df_year$number_of_searches - (ethnicity_df_year$number_of_searches * ethnicity_df_year$percentage_reasonable / 100)
# Define alpha values for reasonable and non-reasonable searches
alpha_values <- c("Linked to search object" = 1, "No link" = 0.5) # Adjust non-reasonable alpha here
# Plotting the bar chart
ggplot(ethnicity_df_year, aes(x = ethnicity, fill = year)) +
# Reasonable searches
geom_bar(aes(y = number_of_searches * percentage_reasonable / 100, alpha = "Linked to search object"), stat = "identity", position = "dodge") +
# Non-reasonable searches (adjusted alpha value for better visibility)
geom_bar(aes(y = non_reasonable_searches, alpha = "No link"), stat = "identity", position = "dodge") +
# Labels for total number of searches
geom_text(
aes(y = number_of_searches, label = number_of_searches),
vjust = -0.4,
position = position_dodge(width = 0.9),
color = "black",
size = 2
) +
# Labels for percentage of reasonable searches
geom_text(
aes(y = number_of_searches * percentage_reasonable / 100, label = paste0(round(percentage_reasonable, 2), "%")),
vjust = 0.7,
position = position_dodge(width = 0.9),
color = "black",
size = 2.5
) +
scale_y_continuous(labels = scales::comma) +
scale_alpha_manual(values = alpha_values, guide = guide_legend(title = "Search Outcome")) +
labs(caption = "Note: \n• Does not include vehicle only searches.\n• Does not include other sexes. \n• Does not include searches record where sex is unkown",x = "Ethnic Group", y = "Number of Searches", title = "Total Number of Stop and Searches by Ethnicity ", subtitle="from Year 2021 to 2023 with outcome-linked search percentage") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1)
)+
theme(plot.subtitle = element_text(hjust = 0.5),# Center-align the subtitle
plot.caption = element_text(hjust = 0), # Left-align the text within the caption
plot.caption.position = "plot" # Center the caption box itself at the bottom of the plot
) +
theme(plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm"))# Center-align the subtitle
library(ggplot2)
library(scales)
library(tidyr)
library(dplyr)
ethnicity_df_year$year <- as.numeric(as.character(ethnicity_df_year$year))
# Reshape the data to long format
long_ethnicity_df_year <- ethnicity_df_year %>%
pivot_longer(
cols = c(searches_per_1000_population_per_year, number_of_searches),
names_to = "metric",
values_to = "value"
)
# Plotting the line graph
line_chart <- ggplot(long_ethnicity_df_year, aes(x = year, y = value, color = ethnicity, group = ethnicity)) +
geom_line() +
geom_point() + # Add points to the line
geom_text(
aes(label = paste0(round(value, 2))),
vjust = -0.5, # Adjust this value to position the text above the points
color = "black",
size = 2.1,
check_overlap = TRUE # Avoid overlapping text
) +
scale_x_continuous(breaks = unique(long_ethnicity_df_year$year)) + # Set breaks at unique years
scale_y_continuous(labels = scales::comma) +
facet_wrap(~ metric, scales = "free_y") +
labs(
caption = "Note: \n• Does not include vehicle only searches. \n• Does not include searches record whose ethnicity is unknown.",
x = "Year", y = "Value",
title = "Search Rate and Total Number of Searches per 1000 Population by Ethnicity",
subtitle = "from Year 2021 to 2023 using 2021 Census Data"
) +
theme_minimal() +
theme(plot.subtitle = element_text(hjust = 0.5),# Center-align the subtitle
legend.position = "bottom",
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
plot.caption = element_text(hjust = 0),
plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm")
)
# Display the line graph
print(line_chart)
unique_values_in_age <- unique(stop_and_searches_2021_2023$age_range)
#unique_values_in_age
age_sum <- dbGetQuery(db, "
SELECT
age_range,
COUNT(*) AS number_of_searches,
COUNT(*)/3 AS average_number_of_searches,
(COUNT(*) * 1000.0) /
(3* (SELECT SUM(REPLACE(page.Population, ',', '') + 0)FROM age_ethnicity page
WHERE page.age_group = ss.age_range) )AS searches_per_1000_population,
(COUNT(*) * 100.0) / (SELECT COUNT(*)
FROM stop_and_searches_2021_2023
WHERE age_range IS NOT NULL ) AS search_percentage,
(SUM(CASE WHEN outcome_linked_to_object_of_search = TRUE THEN 1 ELSE 0 END)*100) /
NULLIF(COUNT(*), 0) AS percentage_reasonable
FROM stop_and_searches_2021_2023 ss
WHERE age_range IS NOT NULL
GROUP BY age_range
")
age_df_year <- dbGetQuery(db, "
SELECT
year,
age_range,
COUNT(*) AS number_of_searches,
(COUNT(*) * 1000.0) /
((SELECT SUM(REPLACE(page.Population, ',', '') + 0)
FROM age_ethnicity page
WHERE page.age_group = ss.age_range) ) AS searches_per_1000_population_per_year,
(COUNT(*) * 100.0) / (SELECT COUNT(*)
FROM stop_and_searches_2021_2023
WHERE year = ss.year AND age_range IS NOT NULL ) AS search_percentage,
(SUM(CASE WHEN outcome_linked_to_object_of_search = TRUE THEN 1 ELSE 0 END)*100) / NULLIF(COUNT(*), 0)
AS percentage_reasonable
FROM stop_and_searches_2021_2023 ss
WHERE age_range IS NOT NULL
GROUP BY year,age_range
")
age_sum
age_df_year
library(ggplot2)
library(scales)
# Assuming ethnicity_df is your dataframe already loaded in R
# Pie Chart for search_percentage with labels
pie_chart <- ggplot(age_sum, aes(x = "", y = search_percentage, fill = age_range)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
geom_text(aes(label = paste0(round(search_percentage, 2), "%")),
position = position_stack(vjust = 0.5),
color = "black") +
theme_void() +
scale_fill_brewer(palette = "Set1", name = "Age Group", labels = paste(age_sum$age_range, ": ", round(age_sum$search_percentage, 2),
"%"))+
theme(
plot.caption = element_text(hjust = 0),
plot.caption.position = "plot",
plot.margin = margin(1, 1, 1, 1, "cm") # Corrected quotation marks
)+
labs(caption = "Note: \n• Does not include vehicle only searches.\n• Does not include searches record where age is unkown. ",x = "Age Group", y = "Number of Searches", title = "Percentage of Stop and Searches by Age Group from 2021 to 2023")
pie_chart
# Define colors for each ethnicity
base_colors <- c("under 10"="#ff7f0e",
"10-17" = "#d62728",
"18-24" ="#1f77b4",
"25-34" = "#2ca02c",
"over 34" = "purple"
)
# Create a named vector for alpha values to be used for the legend
alpha_values <- c("Link to search object" = 1, "No link" = 0.45)
# Create the bar chart
bar_chart <- ggplot(age_sum, aes(x = age_range, y = number_of_searches, fill = age_range)) +
geom_bar(stat = "identity", aes(alpha = "No link")) +
geom_bar(stat = "identity",
aes(y = number_of_searches * percentage_reasonable / 100, alpha = "Link to search object")) +
geom_text(aes(y = number_of_searches * percentage_reasonable / 100,
label = paste0(round(percentage_reasonable, 2), "%")),
vjust = 1,
color = "black") +
geom_text(aes(label = number_of_searches),
vjust = -0.45, size=2.8,
color = "black") +
scale_y_continuous(labels = comma) +
scale_fill_manual(values = base_colors) +
scale_alpha_manual(values = alpha_values,
guide = guide_legend(title = "Outcome")) +
theme_minimal() +
labs(caption = "Note: \n• Does not include vehicle only searches.\n• Does not include searches record where age is unkown.", title = "Total Number of Stop and Searches by Age Group from 2021 to 2023",
subtitle = "with outcome-linked searches percentage",
x = "Age", y = "Number of Searches", fill = "Age") +
theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 8),
plot.subtitle = element_text(hjust = 0.5),
plot.caption = element_text(hjust = 0)) # Center-align the subtitle
bar_chart
library(ggplot2)
library(scales) # For formatting the y-axis labels
# Convert 'year' to a factor to ensure proper dodging
age_df_year$year <- factor(age_df_year$year)
# Plotting the bar chart
ggplot(age_df_year, aes(x = age_range, y = searches_per_1000_population_per_year, fill = year)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(
aes(label = as.integer(searches_per_1000_population_per_year)), # Format as integer
vjust = -0.3,
position = position_dodge(width = 0.9),
color = "black",
size = 3 # Adjust the text size here
) +
scale_y_continuous(labels = scales::comma) + # Use comma formatting for y-axis
labs(x = "Age Group", y = "Number of Searches per 1000 Population", title = "Number of Searches per 1000 Population by Age Group",
subtitle = "from Year 2021 to 2023 using 2021 Census Data") +
theme_minimal() +
theme(plot.caption = element_text(hjust = 0),
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5), # Center-align the title
plot.subtitle = element_text(hjust = 0.5) # Center-align the subtitle
)+labs(
caption = "Note: \n• Does not include vehicle only searches.\n• Does not include searches record where age is unkown."
)
library(ggplot2)
library(scales) # For formatting the y-axis labels
# Convert 'year' to a factor to ensure proper dodging
age_df_year$year <- factor(age_df_year$year)
# Calculate the non-reasonable searches
age_df_year$non_reasonable_searches <- age_df_year$number_of_searches - (age_df_year$number_of_searches * age_df_year$percentage_reasonable / 100)
# Define alpha values for reasonable and non-reasonable searches
alpha_values <- c("Reasonable" = 1, "Non-Reasonable" = 0.5) # Adjust non-reasonable alpha here
# Plotting the bar chart
ggplot(age_df_year, aes(x = age_range, fill =year)) +
# Reasonable searches
geom_bar(aes(y = number_of_searches * percentage_reasonable / 100, alpha = "Reasonable"), stat = "identity", position = "dodge") +
# Non-reasonable searches (adjusted alpha value for better visibility)
geom_bar(aes(y = non_reasonable_searches, alpha = "Non-Reasonable"), stat = "identity", position = "dodge") +
# Labels for total number of searches
geom_text(
aes(y = number_of_searches, label = number_of_searches),
vjust = -0.9,
position = position_dodge(width = 0.9),
color = "black",
size = 2.1
) +
# Labels for percentage of reasonable searches
geom_text(
aes(y = number_of_searches * percentage_reasonable / 100, label = paste0(round(percentage_reasonable, 2), "%")),
vjust = 0.5,
position = position_dodge(width = 0.9),
color = "black",
size = 2.5
) +
scale_y_continuous(labels = scales::comma) +
scale_alpha_manual(values = alpha_values, guide = guide_legend(title = "Search Outcome")) +
labs(caption = "Note: \n• Does not include vehicle only searches.\n• Does not include searches record where age is unkown.", x = "Age Group", y = "Number of Searches", title = "Number of Stop and Searches by Age Group from Year 2021 to 2023") +
theme_minimal() +
theme(
plot.caption = element_text(hjust = 0),
axis.text.x = element_text(angle = 45, hjust = 1)
)
library(ggplot2)
library(scales)
library(tidyr)
library(dplyr)
age_df_year$year <- as.numeric(as.character(age_df_year$year))
# Reshape the data to long format
long_age_df_year <- age_df_year %>%
pivot_longer(
cols = c(searches_per_1000_population_per_year, number_of_searches),
names_to = "metric",
values_to = "value"
)
# Plotting the line graph
line_chart <- ggplot(long_age_df_year, aes(x = year, y = value, color = age_range, group = age_range)) +
geom_line() +
geom_point() + # Add points to the line
geom_text(
aes(label = paste0(round(value, 2))),
vjust = -0.5, # Adjust this value to position the text above the points
color = "black",
size = 2.1,
check_overlap = TRUE # Avoid overlapping text
) +
scale_x_continuous(breaks = unique(long_age_df_year$year)) + # Set breaks at unique years
scale_y_continuous(labels = scales::comma) +
facet_wrap(~ metric, scales = "free_y") +
labs(
caption = "Note: \n• Does not include vehicle only searches.\n• Does not include searches record where age is unkown.",
x = "Year", y = "Value",
title = "Search Rate and Total Number of Searches per 1000 Population by Age Group",
subtitle = "from Financial Year 2020 to 2022 using 2021 Census Data"
) +
theme_minimal() +
theme(plot.subtitle = element_text(hjust = 0.5),# Center-align the subtitle
legend.position = "bottom",
axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5),
plot.caption = element_text(hjust = 0),
plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm")
)
# Display the line graph
print(line_chart)
# to get an overall view from census data against ethnicity
ethnicity_age<- dbGetQuery(db, "
SELECT age_group,
ethnicity_group,
SUM(observation) AS Populatoin
FROM ethnicity_age_sex_census psex
GROUP BY Age_Group, ethnicity_group
")
ethnicity_age
ethnicity_age_sum <- dbGetQuery(db, "
SELECT
age_range,
combined_ethnicity,
COUNT(*) AS number_of_searches,
COUNT(*)/3 AS average_number_of_searches,
(COUNT(*) * 1000.0) /
(3* (SELECT SUM(REPLACE(pae.Population, ',', '') + 0)FROM age_ethnicity pae
WHERE pae.age_group = ss.age_range AND pae.ethnicity = ss.combined_ethnicity) )AS searches_per_1000_population,
(COUNT(*) * 100.0) / (SELECT COUNT(*) FROM stop_and_searches_2021_2023 WHERE age_range IS NOT NULL AND combined_ethnicity IS NOT NULL) AS search_percentage,
(SUM(CASE WHEN outcome_linked_to_object_of_search =TRUE THEN 1 ELSE 0 END) *100)/ NULLIF(COUNT(*), 0) AS percentage_reasonable
FROM stop_and_searches_2021_2023 ss
WHERE age_range IS NOT NULL AND combined_ethnicity IS NOT NULL
GROUP BY age_range,combined_ethnicity
")
ethnicity_age_sum$searches_per_1000_population <- format(ethnicity_age_sum$searches_per_1000_population, scientific = FALSE)
ethnicity_age_sum$search_percentage <- format(ethnicity_age_sum$search_percentage, scientific = FALSE)
ethnicity_age_sum
ethnicity_age_df_year <- dbGetQuery(db, "
SELECT
year,
age_range,
combined_ethnicity,
COUNT(*) AS number_of_searches,
(COUNT(*) * 1000.0) /
((SELECT SUM(REPLACE(pae.Population, ',', '') + 0)FROM age_ethnicity pae
WHERE pae.age_group = ss.age_range AND pae.ethnicity = ss.combined_ethnicity) )AS searches_per_1000_population_per_year,
(COUNT(*) * 100.0) /
(SELECT COUNT(*) FROM stop_and_searches_2021_2023 WHERE year = ss.year AND age_range IS NOT NULL AND combined_ethnicity IS NOT NULL ) AS search_percentage,
(SUM(CASE WHEN outcome_Linked_to_object_of_search = TRUE THEN 1 ELSE 0 END)*100) / NULLIF(COUNT(*), 0) AS percentage_reasonable
FROM stop_and_searches_2021_2023 ss
WHERE age_range IS NOT NULL AND combined_ethnicity IS NOT NULL
GROUP BY year,age_range,combined_ethnicity
")
ethnicity_age_df_year$search_percentage <- format(ethnicity_age_df_year$search_percentage, scientific = FALSE)
ethnicity_age_df_year
library(ggplot2)
ethnicity_age_sum$searches_per_1000_population <- as.numeric(ethnicity_age_sum$searches_per_1000_population)
# If there are no non-numeric characters, you can convert directly to numeric
# age_sex_sum$searches_per_1000_population <- as.numeric(age_sex_sum$searches_per_1000_population)
# Now round and convert to integer
ethnicity_age_sum$searches_per_1000_population <- as.integer(round(ethnicity_age_sum$searches_per_1000_population))
my_colors <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
# Plotting the bar chart
ggplot(ethnicity_age_sum, aes(x = age_range, y = average_number_of_searches, fill = combined_ethnicity)) +
geom_bar(stat = "identity", position = "dodge", alpha = 1) +
scale_fill_manual(values = my_colors) +
scale_y_continuous(labels = label_number(big.mark = ",")) + # This will format numbers with commas
labs(
caption = "Note:\n• Does not include vehicle only searches.\n• Does not include searches record whose ethnicity is unknown.\n• Does not include searches record whose age is unknown.",
title = "Average Number of Searches by Age Group and Ethnicity ",
subtitle = "from year 2021 to 2023 based on 2021 Census",
x = "Age Group",
y = "Average Number of Searches"
) +
theme_minimal()+
theme(
plot.caption = element_text(hjust = 0), # Left-align the text within the caption
plot.caption.position = "plot", # Center the caption box itself at the bottom of the plot
plot.subtitle = element_text(hjust = 0.5))# Center-align the subtitle
# Plotting the bar chart
ggplot(ethnicity_age_sum, aes(x = age_range, y = searches_per_1000_population, fill = combined_ethnicity)) +
geom_bar(stat = "identity", position = "dodge", alpha = 1) +
scale_fill_manual(values = my_colors) +
scale_y_continuous(labels = label_number(big.mark = ",")) + # This will format numbers with commas
labs(
caption = "Note: \n• Does not include vehicle only searches.\n• Does not include searches record whose ethnicity is unknown.\n• Does not include searches record whose age is unknown.",
title = "Average Searches per 1000 Population by Age Group and Ethnicity",
subtitle = "from year 2021 to 2023 based on 2021 Census",
x = "Age Group",
y = "Searches per 1000 Population"
) +
theme_minimal() +
theme(
plot.caption = element_text(hjust = 0), # Left-align the text within the caption
plot.caption.position = "plot", # Center the caption box itself at the bottom of the plot
plot.subtitle = element_text(hjust = 0.5))# Center-align the subtitle
# this part is I summarize the data from each database first to test whether I calculate write figure in the SQL
age_sex<- dbGetQuery(db, "
SELECT age_group,
sex,
SUM(observation) AS Populatoin
FROM ethnicity_age_sex_census psex
GROUP BY age_group, sex
")
age_sex_census<- dbGetQuery(db, "
SELECT age_range,
gender,
COUNT(*) AS number_of_searches
FROM stop_and_searches_2021_2023
GROUP BY age_range, gender
")
age_sex
age_sex_census
age_sex_sum <- dbGetQuery(db, "
SELECT
age_range,
gender,
COUNT(*) AS number_of_searches,
COUNT(*)/3 AS average_number_of_searches,
(COUNT(*) * 1000.0) /
(3 * (SELECT SUM(REPLACE(pas.observation, ',', '') + 0)
FROM ethnicity_age_sex_census pas
WHERE pas.age_group = ss.age_range AND pas.sex = ss.gender)
) AS searches_per_1000_population,
(SELECT SUM(REPLACE(pas.observation, ',', '') + 0)
FROM ethnicity_age_sex_census pas
WHERE pas.age_group = ss.age_range AND pas.sex = ss.gender) * 100.0 /
(SELECT SUM(REPLACE(pas.observation, ',', '') + 0)
FROM ethnicity_age_sex_census pas) AS population_percentage,
(COUNT(*) * 100.0) /
(SELECT COUNT(*) FROM stop_and_searches_2021_2023 WHERE age_range IS NOT NULL AND gender IS NOT NULL AND gender NOT IN ('Other')) AS search_percentage,
(SUM(CASE WHEN outcome_Linked_to_object_of_search = TRUE THEN 1 ELSE 0 END) * 1.0) / NULLIF(COUNT(*), 0) * 100 AS percentage_reasonable
FROM stop_and_searches_2021_2023 ss
WHERE age_range IS NOT NULL AND gender IS NOT NULL AND gender NOT IN ('Other')
GROUP BY age_range, gender
")
age_sex_sum$searches_per_1000_population <- format(age_sex_sum$searches_per_1000_population, scientific = FALSE)
age_sex_sum$search_percentage <- format(age_sex_sum$search_percentage, scientific = FALSE)
age_sex_sum
library(ggplot2)
library(dplyr)
library(tidyr)
age_sex_sum$population_percentage <- as.numeric(as.character(age_sex_sum$population_percentage))
age_sex_sum$search_percentage <- as.numeric(as.character(age_sex_sum$search_percentage))
# Convert the data to long format
df_long <- age_sex_sum %>%
pivot_longer(cols = c(population_percentage, search_percentage),
names_to = "metric", values_to = "percentage") %>%
mutate(percentage = if_else(metric == "population_percentage", -percentage, percentage),
metric = factor(metric, levels = c("population_percentage", "search_percentage"))) %>%
unite("metric_gender", metric, gender, sep = "_") # Combine metric and sex into one column
# Define the width for dodging
dodge_width <- 0.9
# Plot
ggplot(df_long, aes(x = age_range, y = percentage, fill = metric_gender)) +
geom_bar(stat = "identity", position = position_dodge(width = dodge_width)) +
coord_flip() +
scale_y_continuous(labels = abs,
breaks = seq(-100, 100, by = 10),
limits = c(-max(abs(df_long$percentage)), max(abs(df_long$percentage)))) +
scale_fill_manual(values = c("population_percentage_Female" = "red",
"population_percentage_Male" = "blue",
"search_percentage_Female" = "lightpink",
"search_percentage_Male" = "lightblue")) +
labs(x = "Age Group", y = "Percentage", fill = "Group",
title = "Average Searches Percentage versus Population Percentage by Age Goup and Gender",
subtitle = "from year 2021 to 2023 based on 2021 Census",
caption = "Note: \n• Does not include vehicle only searches.\n• Does not include searches record whose age is unknown.\n• Does not include searches record whose gender is unknown.",) +
theme_minimal() +
theme(plot.subtitle = element_text(hjust = 0.5))+# Center-align the subtitle
theme(legend.position = "right", plot.title = element_text(size = 10))+
theme(
plot.caption = element_text(hjust = 0), # Left-align the text within the caption
plot.caption.position = "plot" # Center the caption box itself at the bottom of the plot
)
unique_values_in_force <- unique(stop_and_searches_2021_2023$force)
force_df_sum <- dbGetQuery(db, "
SELECT force FROM stop_and_searches_2021_2023 GROUP BY force")
# to assess data consistent
unique_values_in_force
force_df_sum
force_df_sum <- dbGetQuery(db, "
SELECT
force,
COUNT(*) AS total_number_of_searches_21_23,
COUNT(*) / 3 AS average_number_of_searches,
(COUNT(*) * 1000.0) / (3 * (SELECT SUM(REPLACE(p.Population, ',', '') + 0)
FROM population_by_force_by_ethnicity_21census p
WHERE p.`Police.Force.Area` = ss.force))
AS searches_per_1000_population,
(COUNT(*) * 100.0) / (SELECT COUNT(*)
FROM stop_and_searches_2021_2023
WHERE year = ss.year )
AS search_percentage,
(SUM(CASE WHEN outcome_Linked_to_object_of_search = TRUE THEN 1 ELSE 0 END) * 100.0) /
NULLIF(COUNT(*), 0) AS percentage_reasonable
FROM stop_and_searches_2021_2023 ss
GROUP BY ss.force
")
force_df_sum
force_df_year <- dbGetQuery(db, "
SELECT
force,
COUNT(*) AS total_number_of_searches_21_23,
(COUNT(*) * 1000.0) / (SELECT SUM(REPLACE(p.Population, ',', '') + 0)
FROM population_by_force_by_ethnicity_21census p
WHERE p.`Police.Force.Area` = ss.force)
AS searches_per_1000_population,
(COUNT(*) * 100.0) / (SELECT COUNT(*)
FROM stop_and_searches_2021_2023
WHERE year = ss.year AND force IS NOT NULL )
AS search_percentage,
(SUM(CASE WHEN outcome_Linked_to_object_of_search = TRUE THEN 1 ELSE 0 END) * 100.0) /
NULLIF(COUNT(*), 0) AS percentage_reasonable
FROM stop_and_searches_2021_2023 ss
WHERE ss.force IS NOT NULL
GROUP BY year,ss.force")
force_df_year
library(sf)
library(dplyr)
# Define the path to your folder containing KML files
folder_path <- "force kmls"
# List all KML files in the folder
kml_files <- list.files(folder_path, pattern = "\\.kml$", full.names = TRUE)
# Read all KML files and combine them into one sf object
all_kml_data <- lapply(kml_files, st_read) %>%
bind_rows()
# Now can plot all the boundaries
#ggplot() +
geom_sf(data = all_kml_data)
# download and import the force boundary data from police.data.uk
library(sf)
library(dplyr)
# Define the path to your folder containing KML files
folder_path <- "force kmls"
# List all KML files in the folder
kml_files <- list.files(folder_path, pattern = "\\.kml$", full.names = TRUE)
# Read all KML files, assign the file name (without .kml extension) to the 'Name' column, and combine into one sf object
all_kml_data <- lapply(kml_files, function(file) {
kml_data <- st_read(file)
# Remove the .kml extension from the file name
kml_data$Name <- gsub(".kml", "", basename(file), fixed = TRUE)
return(kml_data)
}) %>% bind_rows()
# Now we can plot all the boundaries with names filled in
ggplot() + geom_sf(data = all_kml_data, aes(fill = Name)) # Fill based on the Name column
#store at local
saveRDS(all_kml_data, "all_kml_data.rds")
all_kml_data <- readRDS("all_kml_data.rds")
# Unique names from all_kml_data_update
unique_names <- unique(all_kml_data$Name)
# Unique police_force_area from force_df_sum
unique_police_force_areas <- unique(force_df_sum$force)
# Check unmatched names in both directions
unmatched_names_in_force_df_sum <- unique_police_force_areas[!unique_police_force_areas %in% unique_names]
unmatched_names_in_all_kml_data <- unique_names[!unique_names %in% unique_police_force_areas]
# Print unmatched names
print("Unmatched Names in force_df_sum:")
print(unmatched_names_in_force_df_sum)
print("Unmatched Names in all_kml_data:")
print(unmatched_names_in_all_kml_data)
## This is a sample indication of how t use downloaded archive data as additional data ( partial date from Gwent and North yorkshire )for the research, however running this chunk directly would report error as the total amount of data rows in more rows than dplyr can handle, which mean there are some manipulation needed to do handle this, I give a sample to use filter function to remove rows contain NA. Another methods is to group data and add a column called search, which contains the number of search in same defined search category. I would show how to edit the code in the appendix with that format statistical data from financial year 2020 to 2022 ( Mar 2021 to Mar 2023.)
library(readr)
library(dplyr)
# Function to read all CSVs in a folder, add a modified 'force' column based on the file name
read_csvs_and_add_modified_force <- function(folder) {
csv_files <- list.files(folder, pattern = "\\.csv$", full.names = TRUE)
if (length(csv_files) == 0) {
message("No CSV files found in ", folder)
return(NULL) # Return NULL if no files are found
}
all_data <- lapply(csv_files, function(file) {
data <- read_csv(file)
# Extract the relevant part of the filename for 'force'
force_name <- sub("^\\d{4}-\\d{2}-(.*)-stop-and-search.*$", "\\1", basename(file))
data <- mutate(data, force = force_name)
return(data)
})
return(bind_rows(all_data))
}
# Folder path
folder_path <- "additional data" # Replace with your actual folder path
# Read and combine all CSVs from the folder
combined_data <- read_csvs_and_add_modified_force(folder_path)
colnames(combined_data)
colnames(stop_and_searches_2021_2023)
# Renaming columns in combined_data to match those in stop_and_searches_2021_2023
combined_data <- combined_data %>%
rename(
type = Type,
datetime = Date,
location.latitude = Latitude,
location.longitude = Longitude,
gender = Gender,
age_range = `Age range`,
self_defined_ethnicity = `Self-defined ethnicity`,
officer_defined_ethnicity = `Officer-defined ethnicity`,
legislation = Legislation,
object_of_search = `Object of search`,
outcome_object.name = Outcome,
outcome_linked_to_object_of_search = `Outcome linked to object of search`,
force = force
)
##
combined_data_filtered <- combined_data %>%
filter( !is.na(gender) & !is.na(age_range) )
# Joining combined_data and stop_and_searches_2021_2023
# Replace 'type' with the appropriate column(s) for joining
joined_data <- inner_join(combined_data, stop_and_searches_2021_2023, by = "type")
map_to_plot<- right_join(force_df_sum , all_kml_data, by = c("force" = "Name"))
# After the join, map_to_plot will include all rows from force_df_sum and matching rows from all_kml_data_update, including the one with empty geometry.
# This will remove rows where the geometry is empty (including 'MULTIPOLYGON Z EMPTY')
map_to_plot_filtered<- map_to_plot %>%
filter(!st_is_empty(geometry))
library(sf)
# Check if map_to_plot is still an sf object
#class(map_to_plot_filtered)
# If map_to_plot is not an sf object,I need to convert it back
if (!"sf" %in% class(map_to_plot_filtered)) {
map_to_plot_filtered <- st_as_sf(map_to_plot_filtered)
}
# Check the class again
#class(map_to_plot_filtered)
# Check rows with empty geometry
rows_with_empty_geometry <- map_to_plot_filtered[which(st_is_empty(st_geometry(map_to_plot_filtered))), ]
rows_with_empty_geometry <- map_to_plot_filtered[which(st_is_empty(st_geometry(map_to_plot_filtered))), ]
# Count the number of such rows
num_rows_with_empty_geometry <- nrow(rows_with_empty_geometry)
# Print the count
print(paste("Number of rows with 'MULTIPOLYGON Z EMPTY' geometry:", num_rows_with_empty_geometry))
# Optionally, view the rows with empty geometry
if (num_rows_with_empty_geometry > 0) {
print(rows_with_empty_geometry$police_force_area)
}
library("sf")
library("tmap")
library("classInt")
library(sf)
library(ggplot2)
library(dplyr)
install.packages("ggspatial")
library(ggspatial)
# Identify the highest value and second highest value
highest_value <- max(map_to_plot_filtered$searches_per_1000_population, na.rm = TRUE)
second_highest_value <- sort(map_to_plot_filtered$searches_per_1000_population, decreasing = TRUE)[2]
highest_location <- st_centroid(map_to_plot_filtered[which.max(map_to_plot_filtered$searches_per_1000_population), ])
# Plot with ggplot
map_rate <- ggplot(map_to_plot_filtered) +
geom_sf(aes(fill = searches_per_1000_population)) +
geom_sf(data = highest_location, color = "red", size = 3, shape = 8) + # Shape 8 is a star
scale_fill_viridis_c(
name = "Relative Search Rate per 1000 Population",
limits = c(0, second_highest_value), # Set the limits excluding the highest value
option = "D"
) +
labs(title = "Average Search Rate Distribution by Police Force from 2021 to 2023",
caption = "Note: \n• The red star indicates the force area with the highest value of searches per 1000 population:City of London at 295. \n• Grey Area indicates where stop and search data is missing due to technical issues from DATA.POLICE.UK"
) +
theme_minimal() +
theme(legend.position = "right", plot.caption = element_text(hjust = 0))+
annotation_scale(location = "bl")
# Print the map
print(map_rate)
library("plotly")
#ggplotly(map_rate) #interactive map plot to help discover regional difference in detail
library(ggplot2)
library(sf)
library(viridis)
library(ggplot2)
library(sf)
# Identify the highest value and its index
highest_value <- max(map_to_plot_filtered$average_number_of_searches, na.rm = TRUE)
highest_index <- which.max(map_to_plot_filtered$average_number_of_searches)
# Create a subset for the highest value area
highest_area <- map_to_plot_filtered[highest_index, ]
# Plot with ggplot
map_rate <- ggplot(map_to_plot_filtered) +
geom_sf(aes(fill = average_number_of_searches), lwd = 0.2) + # Fill based on the average number of searches
scale_fill_viridis_c(
name = "Average Number of Searches",
option = "D",
limits = c(0, 40000), # Set fixed limits for the color scale
breaks = c(0, 10000, 20000, 30000, 40000), # Set breaks for the legend
oob = scales::oob_squish # Keep out-of-bounds data points
) +
geom_sf(data = highest_area, fill = "red", lwd = 0.2) + # Overlay the highest area in red
labs(title = "Average Search Frequency Distribution by Police Force from 2021 to 2023",
caption = paste("Note: \n• The red area indicates the force area with the highest average searches from 2021 to 2023:",
"City of London at", highest_value, ".\n• Grey Area indicates where stop and search data is missing due to technical issues from DATA.POLICE.UK")
) +
theme_minimal() +
theme(legend.position = "right", plot.caption = element_text(hjust = 0)) +
annotation_scale(location = "bl")
# Print the map
print(map_rate)
library("plotly")
# ggplotly(map_rate) #interactive map plot to help discover regional difference in detail
# Alternative methods to use tamp to draw the graph with interactive map we could directly identify the force and its figure to help write report
# Load required libraries
library(sf)
library(tmap)
library(dplyr)
map <- tm_shape(map_to_plot_filtered) +
tm_polygons("average_number_of_searches")
# Let's improve the map by creating more visual variation with our bins:
breaks <- classInt::classIntervals(map_to_plot_filtered$average_number_of_searches, n = 5, style = "jenks", dig.lab=10)
breaks
map_to_plot_filtered <- map_to_plot_filtered |>
mutate(number_of_searches_cut = cut(average_number_of_searches, breaks$brks, include.lowest = TRUE, dig.lab=10))
map <- tm_shape(map_to_plot_filtered) +
tm_polygons("average_number_of_searches", palette = "Reds", title = "Average Number of Searches") +
tm_layout(main.title = " UK Police Stop and Search, 2020-2023", legend.outside = TRUE, frame = FALSE)+
tm_compass(position = c("right", "bottom")) +
tm_scale_bar(position = c("right", "bottom"))
# Finally, we can create a dynamic "live" map using tmaps:
tmap_options(check.and.fix = TRUE)
tmap_mode("view")
map
tmap_save(map, "UK_stop_and_search.html")
tmap_mode("plot")
map <- tm_shape(map_to_plot_filtered) +
tm_polygons("searches_per_1000_population")
# Let's improve the map by creating more visual variation with our bins:
breaks <- classInt::classIntervals(map_to_plot_filtered$searches_per_1000_population, n = 5, style = "jenks", dig.lab=10)
breaks
map_to_plot_filtered <- map_to_plot_filtered |>
mutate(number_of_searches_cut = cut(searches_per_1000_population, breaks$brks, include.lowest = TRUE, dig.lab=10))
map <- tm_shape(map_to_plot_filtered) +
tm_polygons("average_number_of_searches", palette = "Reds", title = "Number of Searches Per 1000 Population") +
tm_layout(main.title = " UK Police Stop and Search, 2020-2023", legend.outside = TRUE, frame = FALSE)+
tm_compass(position = c("right", "bottom")) +
tm_scale_bar(position = c("right", "bottom"))
# Finally, we can create a dynamic "live" map using tmaps:
tmap_options(check.and.fix = TRUE)
tmap_mode("view")
map
tmap_save(map, "UK_stop_and_search.html")
tmap_mode("plot")
library(dplyr)
library(lubridate)
# Parse the datetime, extract the hour, and create the daytime range
stop_and_searches_2021_2023 <- stop_and_searches_2021_2023 %>%filter(!is.na(datetime)) %>%
mutate(datetime = ymd_hms(datetime), # Parse the datetime string
hour = hour(datetime), # Extract the hour
daytime = sprintf("%02d:00-%02d:59", hour, hour)) # Format hour range
# Count the number of records for each hour range
hourly_counts <- stop_and_searches_2021_2023 %>%
group_by(daytime) %>%
summarise(count = n())
# Calculate the total number of occurrences
total_count <- sum(hourly_counts$count)
# Calculate the percentage for each hour range
hourly_counts <- hourly_counts %>%
mutate(percentage = (count / total_count) * 100)
# View the result
#print(hourly_counts)
library(ggplot2)
# Plotting the bar chart
ggplot(hourly_counts, aes(x = daytime, y = count)) +
geom_bar(stat = "identity", fill = "steelblue") +
geom_text(aes(label = count), vjust = -0.3, color = "black", size = 2.5) +
labs(x = "Day Time",
y = "Number of searches",
title = "Number of Search Frequency of Searches by Day Time in Hour") +
theme_minimal()+
theme(axis.text.x = element_text(angle = 90, hjust = 1)) # Rotate x-axis labels for better readability
library(dplyr)
library(lubridate)
# Add a new column for the day of the week
stop_and_searches_2021_2023 <- stop_and_searches_2021_2023 %>%filter(!is.na(datetime)) %>%
mutate(datetime = ymd_hms(datetime), # Parse the datetime string
day_of_week = wday(datetime, label = TRUE)) # Extract day of week with label
# Count the number of searches for each day of the week
day_of_week_counts <- stop_and_searches_2021_2023 %>%
group_by(day_of_week) %>%
summarise(count = n())
# Calculate the total number of searches
total_searches <- sum(day_of_week_counts$count)
# Calculate the percentage for each day of the week
day_of_week_counts <- day_of_week_counts %>%
mutate(percentage = (count / total_searches) * 100)
# View the result
#print(day_of_week_counts)
library(ggplot2)
# Plotting the bar chart
ggplot(day_of_week_counts, aes(x = day_of_week, y = count)) +
geom_bar(stat = "identity", fill = "steelblue") +
geom_text(aes(label = count), vjust = -0.3, color = "black", size = 2.5) +
labs(x = "Day",
y = "Number of searches",
title = "Number of Search Frequency of Searches by Day ") +
theme_minimal()
# this chunk generates the complete code appendix.
# eval=FALSE tells R not to run (``evaluate'') the code here (it was already run before).
# Below shows how to use open statistical data from financial year to plot the map, which has the data of missing forces
#install.packages(c("DBI", "RSQLite"))
library(DBI)
stop_search_mar21_mar23 <- read.csv("stop-search-open-data-tables-mar21-mar23.csv")
dbWriteTable(db, "stop_search_mar21_mar23", stop_search_mar21_mar23, row.names = FALSE, overwrite = TRUE)
tables <- dbListTables(db)
#print(tables)
#dbListFields(db,"stop_search_mar21_mar23")
force_df_sum2 <- dbGetQuery(db, "
SELECT
police_force_area,
SUM(ss.number_of_searches) AS number_of_searches,
SUM(ss.number_of_searches)/3 AS average_number_of_searches,
(SUM(ss.number_of_searches) * 1000.0) /
(3* (SELECT SUM(REPLACE(p.Population, ',', '') + 0)FROM population_by_force_by_ethic_21census p
WHERE p.`Police.Force.Area` = ss.police_force_area) )AS searches_per_1000_population,
(SUM(ss.number_of_searches) * 100.0) / (SELECT SUM(number_of_searches) FROM stop_search_mar21_mar23 WHERE police_force_area NOT IN ('Unknown', 'N/A - vehicle search') ) AS search_percentage,
SUM(CASE WHEN link IN ('Linked', 'Not linked', 'Unknown link') THEN 1.0 ELSE 0 END) / NULLIF(SUM(ss.number_of_searches), 0) * 100 AS percentage_reasonable
FROM stop_search_mar21_mar23 ss
WHERE police_force_area NOT IN ('Unknown', 'N/A - vehicle search')
GROUP BY police_force_area
")
force_df_sum2
force_df_year2 <- dbGetQuery(db, "
SELECT
financial_year,
police_force_area,
SUM(ss.number_of_searches) AS number_of_searches,
(SUM(ss.number_of_searches) * 1000.0) /
((SELECT SUM(REPLACE(p.Population, ',', '') + 0)FROM population_by_force_by_ethic_21census p
WHERE p.`Police.Force.Area` = ss.police_force_area) )AS searches_per_1000_population,
(SUM(ss.number_of_searches) * 100.0) / (SELECT SUM(number_of_searches) FROM stop_search_mar21_mar23 WHERE financial_year = ss.financial_year AND police_force_area NOT IN ('Unknown', 'N/A - vehicle search') ) AS search_percentage,
SUM(CASE WHEN link IN ('Linked', 'Not linked', 'Unknown link') THEN 1.0 ELSE 0 END) / NULLIF(SUM(ss.number_of_searches), 0) * 100 AS percentage_reasonable
FROM stop_search_mar21_mar23 ss
WHERE police_force_area NOT IN ('Unknown', 'N/A - vehicle search')
GROUP BY financial_year,police_force_area
")
force_df_year2
# Update the 'Name' column
library(dplyr)
library(stringr)
all_kml_data_update <- all_kml_data %>%
mutate(Name = gsub("-", " ", Name), # Replace "-" with space
Name = gsub(" and ", " & ", Name), # Replace " and " with " & "
Name = str_to_title(Name), # Capitalize each word
# Handle specific cases
Name = if_else(Name == "City Of London", "London, City of", Name),
Name = if_else(Name == "Dyfed Powys", "Dyfed-Powys", Name),
Name = if_else(Name == "Metropolitan", "Metropolitan Police", Name))
all_kml_data_update
#Now the dataframe `all_kml_data_update` is ready to do a join with `force_df_sum` based on the column matching ` police_force_area == Name`
# Unique names from all_kml_data_update
unique_names <- unique(all_kml_data_update$Name)
# Unique police_force_area from force_df_sum
unique_police_force_areas2 <- unique(force_df_sum2$police_force_area)
# Check unmatched names in both directions
unmatched_names_in_force_df_sum2 <- unique_police_force_areas2[!unique_police_force_areas2 %in% unique_names]
unmatched_names_in_all_kml_data_update <- unique_names[!unique_names %in% unique_police_force_areas2]
# Print unmatched names
print("Unmatched Names in force_df_sum:")
print(unmatched_names_in_force_df_sum2)
print("Unmatched Names in all_kml_data_update:")
print(unmatched_names_in_all_kml_data_update)
#The official boundary does not contain the geometry of British Transport Police. So I did a research by running a web scrapping to collect the introduction and geometry for British Transport Police from Wikipedia. But then I found it is a national special police force that polices the railway network of England, Wales and Scotland. The boundary for it is the whole GB.
library(dplyr)
library(RSelenium)
library(rvest)
library("netstat")
library(stringr)
driver <- rsDriver(browser = "firefox", port = free_port(random = TRUE), chromever = NULL)
remDr <- driver[["client"]]
url<-"https://en.wikipedia.org/wiki/British_Transport_Police"
remDr$navigate(url)
html <- read_html(url)
Intro_BTP <- html%>% html_elements(css=" .mw-content-ltr > p:nth-child(5)" )
Intro_BTP_text <- Intro_BTP %>% html_text()
Intro_BTP_text
# Scrape Operations jurisdiction information using CSS selector
operations_jurisdiction <- html %>% html_elements(".infobox-data")
operations_jurisdiction <- operations_jurisdiction[html %>% html_elements(".infobox-label") %>% html_text() == "Operations jurisdiction"] %>% html_text()
print(operations_jurisdiction)
# Scrape Headquarters coordinates using CSS selector
coordinates <- html %>% html_elements(".geo-default") %>% html_text()
coordinates <- ifelse(length(coordinates) > 0, coordinates[[1]], "N/A")
# Extract the part after the slash and remove unnecessary characters
coordinates <- sub(".* / ", "", coordinates)
coordinates <- gsub("; ", " ", coordinates)
print(coordinates)
#combine 2 dataframe together using full join
map_to_plot2<- full_join(force_df_sum2 , all_kml_data_update, by = c("police_force_area" = "Name"))
# After the join, map_to_plot will include all rows from force_df_sum and matching rows from all_kml_data_update, including the one with empty geometry.
# Filter out rows with empty geometry
map_to_plot_filtered2 <- map_to_plot %>%
filter(!st_is_empty(geometry))
# This will remove rows where the geometry is empty (including 'MULTIPOLYGON Z EMPTY')
library(sf)
# Check if map_to_plot is still an sf object
class(map_to_plot_filtered2)
# If map_to_plot is not an sf object,I need to convert it back
if (!"sf" %in% class(map_to_plot_filtered2)) {
map_to_plot_filtered2 <- st_as_sf(map_to_plot_filtered2)
}
# Check the class again
class(map_to_plot_filtered2)
# Check rows with empty geometry
rows_with_empty_geometry <- map_to_plot_filtered2[which(st_is_empty(st_geometry(map_to_plot_filtered2))), ]
rows_with_empty_geometry <- map_to_plot_filtered2[which(st_is_empty(st_geometry(map_to_plot_filtered2))), ]
# Count the number of such rows
num_rows_with_empty_geometry <- nrow(rows_with_empty_geometry)
# Print the count
print(paste("Number of rows with 'MULTIPOLYGON Z EMPTY' geometry:", num_rows_with_empty_geometry))
# Optionally, view the rows with empty geometry
if (num_rows_with_empty_geometry > 0) {
print(rows_with_empty_geometry$police_force_area)
}
library(sf)
library(ggplot2)
library(dplyr)
install.packages("ggspatial")
library(ggspatial)
# Identify the highest value and its location
highest_value <- max(map_to_plot_filtered2$searches_per_1000_population, na.rm = TRUE)
# Identify the highest value and second highest value
highest_value <- max(map_to_plot_filtered2$searches_per_1000_population, na.rm = TRUE)
second_highest_value <- sort(map_to_plot_filtered2$searches_per_1000_population, decreasing = TRUE)[2]
highest_location <- st_centroid(map_to_plot_filtered2[which.max(map_to_plot_filtered2$searches_per_1000_population), ])
# Plot with ggplot
map_rate <- ggplot(map_to_plot_filtered) +
geom_sf(aes(fill = searches_per_1000_population)) +
geom_sf(data = highest_location, color = "red", size = 3, shape = 8) + # Shape 8 is a star
scale_fill_viridis_c(
name = "Searches per 1000 Population",
limits = c(0, second_highest_value), # Set the limits excluding the highest value
option = "D"
) +
labs(
caption = "Note: \n• The red star indicates the force area with the highest value of searches per 1000 population:City of London at 295. \n• Grey Area indicates where stop and search data is missing due to technical issues from DATA.POLICE.UK"
) +
theme_minimal() +
theme(legend.position = "right", plot.caption = element_text(hjust = 0))+
annotation_scale(location = "bl")
# Print the map
print(map_rate)
library("plotly")
#ggplotly(map_rate). #interactive map plot to help discover regional difference in detail
library(sf)
library(dplyr)
map <- tm_shape(map_to_plot_filtered) +
tm_polygons("average_number_of_searches")
# Let's improve the map by creating more visual variation with our bins:
breaks <- classInt::classIntervals(map_to_plot_filtered$average_number_of_searches, n = 5, style = "jenks", dig.lab=10)
breaks
map_to_plot_filtered <- map_to_plot_filtered |>
mutate(number_of_searches_cut = cut(average_number_of_searches, breaks$brks, include.lowest = TRUE, dig.lab=10))
map <- tm_shape(map_to_plot_filtered) +
tm_polygons("average_number_of_searches", palette = "Reds", title = "Average Number of Searches") +
tm_layout(main.title = " UK Police Stop and Search, 2020-2023", legend.outside = TRUE, frame = FALSE)
map
tmap_save(map, "UK_stop_and_search.png", dpi = 300)
# Finally, we can create a dynamic "live" map using tmaps:
tmap_options(check.and.fix = TRUE)
tmap_mode("view")
map
tmap_save(map, "UK_stop_and_search.html")
tmap_mode("plot")
## police_force_area number_of_searches average_number_of_searches
## 1 Avon & Somerset 23637 7879
## 2 Bedfordshire 11690 3896
## 3 British Transport Police 35476 11825
## 4 Cambridgeshire 9082 3027
## 5 Cheshire 23771 7923
## 6 Cleveland 19696 6565
## 7 Cumbria 10528 3509
## 8 Derbyshire 5992 1997
## 9 Devon & Cornwall 20603 6867
## 10 Dorset 6686 2228
## 11 Durham 8002 2667
## 12 Dyfed-Powys 15209 5069
## 13 Essex 61911 20637
## 14 Gloucestershire 7360 2453
## 15 Greater Manchester 51693 17231
## 16 Gwent 9176 3058
## 17 Hampshire 32679 10893
## 18 Hertfordshire 24103 8034
## 19 Humberside 18730 6243
## 20 Kent 39072 13024
## 21 Lancashire 34623 11541
## 22 Leicestershire 19180 6393
## 23 Lincolnshire 9730 3243
## 24 London, City of 7602 2534
## 25 Merseyside 147522 49174
## 26 Metropolitan Police 706344 235448
## 27 Norfolk 16694 5564
## 28 North Wales 14000 4666
## 29 North Yorkshire 9538 3179
## 30 Northamptonshire 9862 3287
## 31 Northumbria 14747 4915
## 32 Nottinghamshire 14344 4781
## 33 South Wales 33250 11083
## 34 South Yorkshire 44434 14811
## 35 Staffordshire 15278 5092
## 36 Suffolk 13606 4535
## 37 Surrey 17293 5764
## 38 Sussex 20400 6800
## 39 Thames Valley 47044 15681
## 40 Warwickshire 5394 1798
## 41 West Mercia 14621 4873
## 42 West Midlands 80983 26994
## 43 West Yorkshire 55603 18534
## 44 Wiltshire 5699 1899
## searches_per_1000_population search_percentage percentage_reasonable
## 1 4.516284 1.3183765 17.222998
## 2 5.529257 0.6520210 19.033362
## 3 NA 1.9787081 19.810576
## 4 3.384322 0.5065573 21.096675
## 5 7.233768 1.3258504 13.449161
## 6 11.530610 1.0985634 10.773761
## 7 7.020815 0.5872093 11.559650
## 8 1.891427 0.3342096 20.594126
## 9 3.840951 1.1491522 13.837791
## 10 2.858089 0.3729181 20.909363
## 11 4.234783 0.4463193 13.046738
## 12 9.826859 0.8482966 8.488395
## 13 11.093974 3.4531457 9.072701
## 14 3.803205 0.4105111 21.970109
## 15 6.008539 2.8832269 11.694040
## 16 5.204593 0.5118002 14.625109
## 17 5.451213 1.8227027 13.133817
## 18 6.701963 1.3443680 14.587396
## 19 6.671090 1.0446838 12.909770
## 20 7.017809 2.1792784 14.424652
## 21 7.537472 1.9311312 9.924039
## 22 5.698234 1.0697830 15.683003
## 23 4.221079 0.5427001 15.436793
## 24 295.200373 0.4240089 24.533018
## 25 34.549627 8.2281817 3.743848
## 26 26.782383 39.3970172 3.297827
## 27 6.074128 0.9311239 12.154067
## 28 6.793650 0.7808635 11.478571
## 29 3.885196 0.5319911 14.971692
## 30 4.186374 0.5500626 18.535794
## 31 3.395360 0.8225281 10.829321
## 32 4.163260 0.8000504 17.909927
## 33 8.415514 1.8545508 9.714286
## 34 10.771840 2.4783492 9.882072
## 35 4.488992 0.8521452 15.185234
## 36 5.962100 0.7588878 15.522564
## 37 4.791182 0.9645337 14.236974
## 38 3.986806 1.1378297 17.107843
## 39 6.234617 2.6239244 18.788793
## 40 3.012861 0.3008555 27.790137
## 41 3.749381 0.8155004 19.725053
## 42 9.245727 4.5169049 11.678994
## 43 7.881655 3.1013109 8.512131
## 44 2.554194 0.3178672 17.231093
## financial_year police_force_area number_of_searches
## 1 2020/21 Avon & Somerset 10386
## 2 2020/21 Bedfordshire 4028
## 3 2020/21 British Transport Police 12885
## 4 2020/21 Cambridgeshire 3551
## 5 2020/21 Cheshire 5831
## 6 2020/21 Cleveland 6350
## 7 2020/21 Cumbria 3594
## 8 2020/21 Derbyshire 2214
## 9 2020/21 Devon & Cornwall 8420
## 10 2020/21 Dorset 2678
## 11 2020/21 Durham 2379
## 12 2020/21 Dyfed-Powys 3463
## 13 2020/21 Essex 26043
## 14 2020/21 Gloucestershire 3202
## 15 2020/21 Greater Manchester 11741
## 16 2020/21 Gwent 4842
## 17 2020/21 Hampshire 12782
## 18 2020/21 Hertfordshire 9881
## 19 2020/21 Humberside 7424
## 20 2020/21 Kent 13903
## 21 2020/21 Lancashire 11165
## 22 2020/21 Leicestershire 6932
## 23 2020/21 Lincolnshire 3535
## 24 2020/21 London, City of 2743
## 25 2020/21 Merseyside 48818
## 26 2020/21 Metropolitan Police 316747
## 27 2020/21 Norfolk 7382
## 28 2020/21 North Wales 6201
## 29 2020/21 North Yorkshire 3797
## 30 2020/21 Northamptonshire 2835
## 31 2020/21 Northumbria 5017
## 32 2020/21 Nottinghamshire 5089
## 33 2020/21 South Wales 16248
## 34 2020/21 South Yorkshire 18841
## 35 2020/21 Staffordshire 5930
## 36 2020/21 Suffolk 5250
## 37 2020/21 Surrey 7454
## 38 2020/21 Sussex 8729
## 39 2020/21 Thames Valley 18982
## 40 2020/21 Warwickshire 2415
## 41 2020/21 West Mercia 6042
## 42 2020/21 West Midlands 25895
## 43 2020/21 West Yorkshire 21047
## 44 2020/21 Wiltshire 2223
## 45 2021/22 Avon & Somerset 6915
## 46 2021/22 Bedfordshire 3671
## 47 2021/22 British Transport Police 10467
## 48 2021/22 Cambridgeshire 2665
## 49 2021/22 Cheshire 5997
## 50 2021/22 Cleveland 6210
## 51 2021/22 Cumbria 2277
## 52 2021/22 Derbyshire 1735
## 53 2021/22 Devon & Cornwall 5856
## 54 2021/22 Dorset 2115
## 55 2021/22 Durham 2209
## 56 2021/22 Dyfed-Powys 2669
## 57 2021/22 Essex 18248
## 58 2021/22 Gloucestershire 1946
## 59 2021/22 Greater Manchester 9642
## 60 2021/22 Gwent 1977
## 61 2021/22 Hampshire 9791
## 62 2021/22 Hertfordshire 7335
## 63 2021/22 Humberside 6083
## 64 2021/22 Kent 12139
## 65 2021/22 Lancashire 10632
## 66 2021/22 Leicestershire 5633
## 67 2021/22 Lincolnshire 2887
## 68 2021/22 London, City of 2672
## 69 2021/22 Merseyside 46059
## 70 2021/22 Metropolitan Police 212107
## 71 2021/22 Norfolk 5335
## 72 2021/22 North Wales 4191
## 73 2021/22 North Yorkshire 2518
## 74 2021/22 Northamptonshire 3156
## 75 2021/22 Northumbria 4243
## 76 2021/22 Nottinghamshire 4466
## 77 2021/22 South Wales 9900
## 78 2021/22 South Yorkshire 12854
## 79 2021/22 Staffordshire 4658
## 80 2021/22 Suffolk 4217
## 81 2021/22 Surrey 4974
## 82 2021/22 Sussex 6183
## 83 2021/22 Thames Valley 13843
## 84 2021/22 Warwickshire 1766
## 85 2021/22 West Mercia 4334
## 86 2021/22 West Midlands 26372
## 87 2021/22 West Yorkshire 16329
## 88 2021/22 Wiltshire 1694
## 89 2022/23 Avon & Somerset 6336
## 90 2022/23 Bedfordshire 3991
## 91 2022/23 British Transport Police 12124
## 92 2022/23 Cambridgeshire 2866
## 93 2022/23 Cheshire 11943
## 94 2022/23 Cleveland 7136
## 95 2022/23 Cumbria 4657
## 96 2022/23 Derbyshire 2043
## 97 2022/23 Devon & Cornwall 6327
## 98 2022/23 Dorset 1893
## 99 2022/23 Durham 3414
## 100 2022/23 Dyfed-Powys 9077
## 101 2022/23 Essex 17620
## 102 2022/23 Gloucestershire 2212
## 103 2022/23 Greater Manchester 30310
## 104 2022/23 Gwent 2357
## 105 2022/23 Hampshire 10106
## 106 2022/23 Hertfordshire 6887
## 107 2022/23 Humberside 5223
## 108 2022/23 Kent 13030
## 109 2022/23 Lancashire 12826
## 110 2022/23 Leicestershire 6615
## 111 2022/23 Lincolnshire 3308
## 112 2022/23 London, City of 2187
## 113 2022/23 Merseyside 52645
## 114 2022/23 Metropolitan Police 177490
## 115 2022/23 Norfolk 3977
## 116 2022/23 North Wales 3608
## 117 2022/23 North Yorkshire 3223
## 118 2022/23 Northamptonshire 3871
## 119 2022/23 Northumbria 5487
## 120 2022/23 Nottinghamshire 4789
## 121 2022/23 South Wales 7102
## 122 2022/23 South Yorkshire 12739
## 123 2022/23 Staffordshire 4690
## 124 2022/23 Suffolk 4139
## 125 2022/23 Surrey 4865
## 126 2022/23 Sussex 5488
## 127 2022/23 Thames Valley 14219
## 128 2022/23 Warwickshire 1213
## 129 2022/23 West Mercia 4245
## 130 2022/23 West Midlands 28716
## 131 2022/23 West Yorkshire 18227
## 132 2022/23 Wiltshire 1782
## searches_per_1000_population search_percentage percentage_reasonable
## 1 5.953309 1.4527622 15.357212
## 2 5.715615 0.5634244 16.757696
## 3 NA 1.8023147 17.873496
## 4 3.969740 0.4967031 18.220220
## 5 5.323306 0.8156226 11.661808
## 6 11.152423 0.8882187 9.401575
## 7 7.190200 0.5027178 13.466889
## 8 2.096605 0.3096876 20.099368
## 9 4.709141 1.1777640 13.610451
## 10 3.434324 0.3745905 19.454817
## 11 3.777012 0.3327673 13.114754
## 12 6.712554 0.4843939 9.240543
## 13 14.000115 3.6428158 7.971432
## 14 4.963802 0.4478860 18.457214
## 15 4.094148 1.6422954 14.990205
## 16 8.239094 0.6772843 12.515489
## 17 6.396530 1.7879074 13.135660
## 18 8.242388 1.3821243 12.711264
## 19 7.932649 1.0384466 11.301185
## 20 7.491446 1.9447094 11.393224
## 21 7.291905 1.5617263 8.472906
## 22 6.178336 0.9696271 15.002885
## 23 4.600672 0.4944651 14.455446
## 24 319.547996 0.3836825 23.076923
## 25 34.299501 6.8285136 3.797780
## 26 36.030204 44.3056088 2.559772
## 27 8.057844 1.0325717 9.834733
## 28 9.027305 0.8673771 9.192066
## 29 4.639994 0.5311128 12.562549
## 30 3.610334 0.3965512 20.000000
## 31 3.465353 0.7017627 10.524218
## 32 4.431155 0.7118339 15.445078
## 33 12.337017 2.2727209 7.656327
## 34 13.702496 2.6354219 7.743750
## 35 5.227069 0.8294704 14.451939
## 36 6.901592 0.7343541 13.257143
## 37 6.195594 1.0426429 12.463107
## 38 5.117769 1.2209860 15.293848
## 39 7.546903 2.6551445 20.608998
## 40 4.046751 0.3378029 21.159420
## 41 4.648196 0.8451366 17.725919
## 42 8.869199 3.6221140 10.708631
## 43 8.950156 2.9439905 6.575759
## 44 2.988932 0.3109465 14.035088
## 45 3.963714 1.3023335 17.700651
## 46 5.209043 0.6913762 19.803868
## 47 NA 1.9712978 19.422948
## 48 2.979261 0.5019116 22.326454
## 49 5.474852 1.1294423 15.591129
## 50 10.906543 1.1695576 11.095008
## 51 4.555394 0.4288378 13.746157
## 52 1.643003 0.3267605 22.247839
## 53 3.275146 1.1028872 13.439208
## 54 2.712321 0.3983276 21.229314
## 55 3.507112 0.4160310 13.942961
## 56 5.173493 0.5026649 11.165230
## 57 9.809703 3.4367290 9.135248
## 58 3.016727 0.3664990 27.286742
## 59 3.362215 1.8159218 15.764364
## 60 3.364041 0.3723374 16.034396
## 61 4.899736 1.8439837 13.277500
## 62 6.118603 1.3814340 14.587594
## 63 6.499772 1.1456391 12.181489
## 64 6.540938 2.2861932 14.622292
## 65 6.943800 2.0023730 10.205041
## 66 5.020566 1.0608886 16.882656
## 67 3.757324 0.5437219 17.353654
## 68 311.276794 0.5032299 22.642216
## 69 32.361029 8.6745014 3.669207
## 70 24.127327 39.9470780 3.548209
## 71 5.823435 1.0047649 10.946579
## 72 6.101183 0.7893101 13.171081
## 73 3.077036 0.4742264 15.647339
## 74 4.019123 0.5943839 20.025349
## 75 2.930734 0.7991035 11.430592
## 76 3.888689 0.8411021 18.092253
## 77 7.517016 1.8645121 10.141414
## 78 9.348330 2.4208524 10.697059
## 79 4.105849 0.8772624 14.877630
## 80 5.543622 0.7942068 16.741760
## 81 4.134275 0.9367761 14.957780
## 82 3.625062 1.1644726 17.224648
## 83 5.503729 2.6071153 20.371307
## 84 2.959239 0.3325988 23.669309
## 85 3.334208 0.8162420 20.973696
## 86 9.032574 4.9667590 11.398453
## 87 6.943845 3.0753150 8.916651
## 88 2.277665 0.3190387 17.768595
## 89 3.631828 1.1583117 19.760101
## 90 5.663114 0.7296121 20.621398
## 91 NA 2.2164412 22.203893
## 92 3.203964 0.5239459 23.517097
## 93 10.903145 2.1833518 13.246253
## 94 12.532865 1.3045632 11.715247
## 95 9.316851 0.8513664 9.018682
## 96 1.934672 0.3734897 19.725893
## 97 3.538567 1.1566664 14.509246
## 98 2.427623 0.3460676 22.609614
## 99 5.420226 0.6241282 12.419449
## 100 17.594529 1.6594059 7.414344
## 101 9.472105 3.2211889 10.635641
## 102 3.429085 0.4043854 22.377939
## 103 10.569254 5.5411031 9.122402
## 104 4.010645 0.4308934 17.776835
## 105 5.057372 1.8475219 12.992282
## 106 5.744897 1.2590425 17.278931
## 107 5.580850 0.9548394 16.044419
## 108 7.021042 2.3820710 17.475058
## 109 8.376710 2.3447769 10.954312
## 110 5.895801 1.2093170 15.374150
## 111 4.305240 0.6047499 14.812576
## 112 254.776328 0.3998150 28.669410
## 113 36.988349 9.6242617 3.759141
## 114 20.189618 32.4477197 4.315736
## 115 4.341106 0.7270527 18.078954
## 116 5.252462 0.6595942 13.442350
## 117 3.938557 0.5892107 17.282035
## 118 4.929665 0.7076744 16.249031
## 119 3.789993 1.0031024 10.643339
## 120 4.169936 0.8754979 20.359156
## 121 5.392510 1.2983475 13.827091
## 122 9.264694 2.3288721 12.222309
## 123 4.134056 0.8573993 16.417910
## 124 5.441084 0.7566686 17.153902
## 125 4.043677 0.8893918 16.217883
## 126 3.217587 1.0032852 19.861516
## 127 5.653219 2.5994373 14.818201
## 128 2.032592 0.2217538 46.990932
## 129 3.265739 0.7760469 21.295642
## 130 9.835409 5.2496970 12.811673
## 131 7.750962 3.3321572 10.385692
## 132 2.395986 0.3257752 20.707071
## Simple feature collection with 44 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XYZ
## Bounding box: xmin: -8.624268 ymin: 49.8821 xmax: 1.768976 ymax: 55.81167
## z_range: zmin: 0 zmax: 0
## Geodetic CRS: WGS 84
## First 10 features:
## Name Description geometry
## 1 Avon & Somerset MULTIPOLYGON Z (((-3.383222...
## 2 Bedfordshire MULTIPOLYGON Z (((-0.354841...
## 3 Cambridgeshire MULTIPOLYGON Z (((0.435156 ...
## 4 Cheshire MULTIPOLYGON Z (((-2.699278...
## 5 London, City of MULTIPOLYGON Z (((-0.078929...
## 6 Cleveland MULTIPOLYGON Z (((-1.145833...
## 7 Cumbria MULTIPOLYGON Z (((-2.981906...
## 8 Derbyshire MULTIPOLYGON Z (((-1.571625...
## 9 Devon & Cornwall MULTIPOLYGON Z (((-3.7702 5...
## 10 Dorset MULTIPOLYGON Z (((-1.900296...
## [1] "Unmatched Names in force_df_sum:"
## [1] "British Transport Police"
## [1] "Unmatched Names in all_kml_data_update:"
## [1] "Northern Ireland"
## [1] "data.frame"
## [1] "sf" "data.frame"
## [1] "Number of rows with 'MULTIPOLYGON Z EMPTY' geometry: 0"
##
## The downloaded binary packages are in
## /var/folders/w_/2mj5l0j94159cp96r28wxjr00000gn/T//RtmpeUu3cW/downloaded_packages
## Warning: st_centroid assumes attributes are constant over geometries
## st_as_s2(): dropping Z and/or M coordinate
## Scale on map varies by more than 10%, scale bar may be inaccurate
## Warning in
## classInt::classIntervals(map_to_plot_filtered$average_number_of_searches, : var
## has missing values, omitted in finding classes
## style: jenks
## one of 82,251 possible partitions of this variable into 5 classes
## [1307,7565] (7565,17296] (17296,27260] (27260,49307] (49307,169396]
## 30 7 1 1 1
## Map saved to /Users/shengyuwen/Downloads/MY472_final_assignment-main/UK_stop_and_search.png
## Resolution: 2160.642 by 2041.06 pixels
## Size: 7.20214 by 6.803533 inches (300 dpi)
## tmap mode set to interactive viewing